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The maze of impact metrics 


In deciding how to judge the impact of research, evaluators must take into account the effects of 
emphasizing particular measures — and be open about their methods. 


of research articles, data sets and other output, hard-pressed 

research funders and employers need shortcuts to identify and 
reward the work that matters. They have plenty of options: research 
impact is now recognized as a multidimensional affair. 

The conventional measures of scholarly importance — citation met- 
rics, publication in influential journals and the opinion of peers as 
expressed in letters and interviews — still loom large. But to those are 
now added metrics such as article downloads and views, and measures 
of importance beyond the academic realm, including influence on 
policy-makers or health and environment officials, effects on industry 
and the economy, and public outreach. 

Researchers at the Center for the Study of Interdisciplinarity, part 
of the University of North Texas in Dalton, this year came up with 
56 measures of impact (see Nature 497, 439; 2013), including influ- 
ence on curriculum creation, authorship of textbooks and success in 
surveys of colleagues’ esteem. Some of these measures are a little fan- 
ciful, but they demonstrate that it has never been easier for scientists 
to show off the various ways in which their work deserves attention 
— and funds. 

That variety is worth celebrating, but it can lead to dizzying confu- 
sion. How are researchers and evaluators to choose between measures? 
In this issue, Nature looks at some traditional and emerging ways to 
track research quality (see page 287). Ultimately, it is for institutions 
and funders to choose their preferences, but in doing so they should 
take two important considerations into account. 

First, it is important to be aware of the positive and negative effects 
of privileging certain measures. 

For example, emphasizing that research is considered especially 
important if it is published in one of a few historically influential 
journals — Cell, Nature, Science — could be a laudable attempt to get 
scientists to think ambitiously about their research goals. But it can also 
result in excessive pressure to publish big claims, leading to problems 
of irreproducibility, for example. (Nature’s position is that it has been 
publishing research using essentially the same criteria for decades; it 
is up to the scientific community and evaluators to decide how much 
importance they want to place on papers that appear in the journal). 

It is a mistake to consider a research paper important because it is 
published in a journal with a good citation record, as measured by its 
impact factor. As this publication has highlighted many times (see 
in particular Nature 435, 1003-1004; 2005), two articles in the same 
journal may have very different citation records. It is much better to 
focus on the citations, views or downloads of an individual article — 
and to recognize that these metrics vary between research disciplines. 

In another example, emphasizing the economic impacts of research 
may force scientists to think about justifying their taxpayer-funded 
work, but it also runs the risk of distracting them with the lure of 
meaningless patents and ill-considered spin-out companies. 


o much science, so little time. Amid an ever-increasing mountain 


The second important consideration is the need for research evalu- 
ators to be explicit about the methods they use to measure impact. 
Openness is an essential part of earning trust. Evaluators should pub- 
lish worked examples showing how they score assessments, and the 
reasoning behind such scores; even better would be, where possible, to 
publish the full data. Otherwise, researchers might rightfully feel sus- 
picious (see, for example, writer Colin Macilwain’s scepticism towards 
performance metrics: Nature 500, 255; 2013). 

When scientists rail against the ‘impact 


“It has never agenda, their arguments sometimes founder 
been easier on irrelevant confusion between terms: too 
for scientists often, such discussion devolves into attacks 
to show off the on misuse of the impact factor, rather than 
various ways looking at the range of possible metrics. The 
in which their journal citation measure gains misleading 
work deserves prominence because its name happens 


attention — and ___ to include the word impact — a semantic 
funds.” synergy that can cloud debate. 

Arguments against impact metrics are 
strongest when they reference cases in which evaluators do not heed 
the considerations we mention above: in which evaluators choose 
metrics blindly, without sufficient thought for pernicious effects, or 
are secretive or inconsistent about their methodologies. If evaluators 
are to earn the acceptance — rather than the scorn — of the scientists 
whose work they want to fund, they had better pay attention to these 
concerns. 


@ 
High hopes 
Care must be taken not to raise unrealistic 
expectations for RTS,S malaria vaccine. 


have eradicated smallpox and driven polio to near extinction, 

and routine childhood immunization saves millions of chil- 
dren a year from death from diseases such as measles, diphtheria, 
tetanus and whooping cough. So it is not surprising that the public 
tend to view vaccines as synonymous with elimination, or near elimi- 
nation, of our microbial foes. 

This may help to explain last week’s extensive and often upbeat media 
coverage of the 18-month results of a huge phase II trial of the malaria 
vaccine candidate RTS,S/ASO1 in more than 15,000 children across 
7 African countries. In the United Kingdom, for example, the front page 
of The Guardian stated that the vaccine “could save lives of millions of 


Vinee have been an unparalleled public-health success: they 
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children”. Unfortunately, however, it won't. The 18-month results only 
confirm the disappointing results seen after 12 months. 

The RTS,S vaccine is not what most people would think of as a vac- 
cine. It provides only partial protection and most of those vaccinated, 
particularly those in areas with moderate to high malaria transmission 
rates, will eventually contract the disease. There is also confusion over 
its efficacy. Many media reports concluded that although the vaccine 
did not give the 90%-plus efficacy levels of most childhood vaccines, 
it might nonetheless be satisfactory, with a reported 46% reduction in 
cases in children vaccinated when they were aged 5 to 17 months, and 
27% in 6-12-week-old babies. 

Not so. The efficacy figures given for RTS,S are not directly compa- 
rable with those usually given for vaccines. The conventional measure- 
ment ofa vaccine’s success is how may people remain protected after a 
given period, such as 12 months. Because RTS,S is only partially protec- 
tive, a different measurement of efficacy is used — a complex statistical 
model that computes hazard ratios on the basis of the first clinical epi- 
sodes of malaria. As the designers of the method themselves concede, 
“a shortcoming of the vaccine efficacy calculated from hazard ratios 
could be that it is not intuitively understood” Too true. In the hands of 
experts, and regulatory agencies, this hazards-ratio model offers a valid 
measurement of the efficacy ofa partially protective vaccine, but it can 
be easily misinterpreted by the media, politicians and policy-makers. 

It is not possible for outside scientists to deduce a more conventional 
efficacy estimate from the 18-month data, as it was described only 
briefly in press releases from the vaccine’s sponsors, the PATH Malaria 
Vaccine Initiative (MVI) based in Seattle, Washington, and GlaxoSmith- 
Kline (GSK), headquartered in Brentford, UK. (The paper and support- 
ing data are under review at a journal.) But applying a conventional 
measurement of vaccine success to the published figures for 12-month 
estimates — for which detailed data are available — reduces the vac- 
cine’s efficacy by more than one-third (see Nature 478, 439-440; 2011). 
Its protective effect also seems to begin fading after about six months. 


Perhaps more promising are the reductions seen in cases of severe 
malaria, which are reported in the conventional manner. However, 
although a 36% reduction was reported in children of 5-17 months, 
the 15% reduction seen in 6-12-week-old babies was not significant — 
and this age group was the main target of the trial because for logistical 
reasons it is likely that any malaria vaccine would need to be given 
alongside routine immunizations at this age. 

Many vaccine trial participants had access 


“The vaccine to other anti-malarial measures — including 
falls short ofthe _ insecticide-treated bednets and effective drug 
targetsetbythe treatment — so it is possible that the vaccine 
World Health might offer greater benefit to people more 


Or ‘ganization. ” exposed to malaria. Nonetheless, the vaccine 
falls short of the target for a partially protective 
malaria vaccine set in 2006 by the World Health Organization, which 
stated that it should have a “protective efficacy of more than 50% against 
severe disease and death” that “lasts longer than one year”. 

The work will continue. Data on the effects of a booster dose given 
after 18 months will not be available until next year, and RTS,S is also 
due to be tested in combination with a vaccine developed by researchers 
at the University of Oxford, UK, in an early-stage clinical trial. Mean- 
while, the RTS,S trials are to be applauded for having left a lasting legacy 
in the unprecedented collaboration with African scientists who led the 
study, and a first-class clinical-trials infrastructure on the continent. 

RTS,S has been in the works for almost 30 years. Since 2001, the 
MVI has put some US$200 million into it, and GSK more than 
$350 million, with a further $260 million earmarked to complete its 
development. The huge past impact of vaccines risks fuelling illusions 
over the impact of having a malaria ‘vaccine. But the modest efficacy 
of RTS,S means that it falls squarely in competition with other malaria 
control measures, many of which might be more cost-effective. Care 
must be taken not to build excessive expectations that can only lead 
to disappointment over its potentially limited public-health impact. m 


Searching for life 


A look into the past frames our attempts to find 
extraterrestrial intelligence. 


science fiction about it. Twenty years ago this week, Sagan and 

a team of other astronomers announced that they had found 
life on a planet in the Galaxy. They used data from the Galileo space 
telescope to catch clear signatures of methane and carbon dioxide in 
the planet’s atmosphere and abundant water in frozen and liquid states 
on its surface. They even confirmed the presence of radio emissions 
emanating from it — the canonical autograph of intelligence. 

This month's Nature PastCast — one ofa series of special audio 
treats available for free on the Nature website — recounts the tale. The 
twist, of course, was that this planet was Earth. Sagan and his team 
were trying out a method for finding life on other planets, using Earth 
as a calibration for future missions that might explore the depths of 
the Galaxy for signs of life. 

Those were not friendly times for thinking about life elsewhere. At 
the time, the US Congress was debating whether to cut federal funding 
for NASAs SETI programme, the search for extraterrestrial intelligence. 
So Sagan and his team set about their task in as objective a way as they 
could, notwithstanding their foregone conclusion. They were careful 
to declare that “life is the hypothesis of last resort’, and to show that this 
was a scientific question that needed an answer. 

The bigger question, of course, goes unanswered, although not 
for want of trying. SETI was launched in the late 1950s, propelled 


C arl Sagan’s 1993 Nature paper has, rather appropriately, a hint of 
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by the optimism of the space age. In 1959, a paper in Nature by 
Giuseppe Cocconi and Philip Morrison suggested that if civilizations 
elsewhere wanted to contact Earthlings, they would probably use 
electromagnetic signals. “We shall assume that long ago they estab- 
lished a channel of communication that would one day become known 
to us, and that they look forward patiently to the answering signals 
from the Sun which would make known to them that a new society 
has entered the community of intelligence,’ they wrote. 

Soon after, astronomer Frank Drake was preparing for one of the 
first conferences to address the search for extraterrestrial life. As a 
loose agenda, he came up with a list of unknowns that would need to 
be resolved in order to predict whether intelligent life exists elsewhere 
in the Universe. For example, how many star systems exist that are 
suitable for the development of intelligent life? How many Earth-like 
planets are in orbit around them? Whatis the probability of life spark- 
ing into existence on any of them? Drake then formulated an equation 
that created a mathematical framework for such unknowns. 

Research ongoing since Sagan's paper is making Drake's equation 
more solvable today than it has ever been. The control test was per- 
formed, so astronomers know that their tests for life would work. 
Meanwhile, the first exoplanet was found in 1992, and hundreds have 
been spotted since. 

Scientists can use variants of Sagan's prescient control test to 
characterize the atmospheres and locations of exoplanets whizzing 
around their stars. Are we now in an era not of space-age optimism, 
but of realism? Life is still the hypothesis of last resort for astro- 
biologists. But if they find none, they will not be 
disillusioned. It would be just as interesting, they 
say, to find that habitable-looking environments 
do not all sprout life, and that Earth is unique in 
being so full of it. m 
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are active in this arena are aware of the intense scrutiny that 

scientific evidence rightly receives. Yet much of the data that 
shape and underpin crucial areas of public policy, such as improving 
health and reducing poverty, are substandard. 

A report published this week makes a key recommendation to 
address this gap in data quality: the establishment of a new interna- 
tional agency, Worldstat. Worldstat would carry out quality control on 
global statistics. It would assess and improve data-collection practices 
and monitor for the misuse of statistics. Its role is crucial: unless all 
countries gather and publish reliable and comparable data on topics 
such as disease, income and employment, then international com- 
parisons of economic growth, health, life expectancies and so forth 
cannot be relied on. Nor can such data form a firm basis for action by 
governments or international agencies. 

This proposal is part of a wider set of recom- 
mendations that have emerged from a year-long 
process by the Oxford Martin Commission for 
Future Generations, in which I have participated 
as a member. It is chaired by Pascal Lamy, former 
director-general of the World Trade Organiza- 
tion. The commission's 19 members, who hail 
from many nations and have diverse politi- 
cal and professional backgrounds, collectively 
have broad experience and expertise. But what 
brought them all to the table was a shared con- 
cern that a prosperous, equitable and sustainable 
global future is in jeopardy because modern poli- 
tics and businesses have become too preoccupied 
with short-term pressures at the expense of long-term needs. 

The result of our work is the publication of ‘Now for the Long 
Term’, a report that proposes a set of principles aimed at overcoming 
deep political and cultural barriers that obstruct a longer-term vision 
(see www.oxfordmartin.ox.ac.uk/commission). It provides practical 
recommendations for action to deliver progress on climate change, 
reduce economic inequality, improve corporate practices and address 
the chronic burden of disease. 

Data vary so much across the world that in many fields it is almost 
impossible to generate reliable comparisons. Often, the information is 
simply not there. The paucity of economic data on key indices — such 
as average income in poor countries, particularly in Africa — makes it 
difficult to assess the true level of inequality and its drivers. Even in the 
United Kingdom there is concern about proposals to scrap the decadal 
census in its conventional form; whatever changes are made after the 
current consultation period must not compro- 


S cience has become pervasive in public policy, and all of us who 
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Time for global statistics 
we can count on 


Public policy is too often derailed by assessments based on faulty data, says 
Martin Rees, as he calls for the formation of an international data watchdog. 


of governments and international organizations unless there are reliable 
performance indicators. Such indicators help to reduce corruption and 
waste. There are currently serious weaknesses in data quality in areas 
as diverse as health spending, mortality rates, gender representation 
and biodiversity; assessments of success and failure in public policy are 
often based on distorted or subjective perceptions. Asa result, indica- 
tors should come with a ‘health warning’ to emphasize their limitations. 

Worldstat would not be a substitute for existing institutions such as 
the United Nations Statistical Commission or the UN Statistics Divi- 
sion (both within the UN Economic and Social Council). Instead, it 
would complement existing work by focusing on implementing agreed 
standards and improving the capabilities for archiving and interpreting 
data, particularly in the developing world. As a separate entity with 
a budget and resources on a scale comparable with Eurostat, World- 
stat could also fast-track international efforts 
to adopt appropriate and robust indicators for 
sustainable development and direct attention to 
capacity building on this front. This week’s report 
highlights, in particular, the need to devise a 
‘long-term impact index that could comprehen- 
sively measure a country’s long-term progress 
on a much broader range of indicators than the 
standard measure, gross domestic product. 

The report also includes recommendations for 
corporate reform, such as a voluntary taxation 
and regulatory exchange, to address tax abuse and 
avoidance and to harmonize company taxation 
arrangements. It also offers specific proposals for 
dealing with youth unemployment and poverty 
by removing price-distorting subsidies and investing in social protec- 
tion measures, such as conditional cash-transfer programmes. 

The commission’s work allowed us to identify the need for Worldstat, 
and the next step is to work with existing organizations to decide how 
the agency could be funded and organized, and to which bodies it would 
be accountable. Yet we are mindful that it is all too easy to propose new 
international agencies, and that few twentieth-century agencies have 
closed down, even though some are now anachronistic. To counter such 
proliferation, we suggest the introduction of ‘sunset clauses’ that require 
regular reviews of accomplishments to ensure that publicly funded inter- 
national institutions are fit for twenty-first-century purposes. 

Science and engineering can enhance lives in the developing world 
and safeguard the welfare of future generations. But there is a dismay- 
ing gap between what should be done and what actually happens. ‘Now 
for the Long Term’ presents a practical agenda that is designed to help 
overcome the gap between knowledge and action. = 


Martin Rees is a former president of the Royal Society and a fellow of 
Trinity College, University of Cambridge, UK. 
e-mail: mjr36@cam.ac.uk 
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Immune-cell block 
halts brain tumour 


A drug that targets the white 
blood cells fostering brain 
tumours — rather than the 
cancer cells themselves — 
shrinks tumours in mice. 

The aggressive brain cancer 
known as glioblastoma is 
notoriously difficult to treat. 
Johanna Joyce at the Memorial 
Sloan-Kettering Cancer 
Center in New York and her 
colleagues gave mice with 
glioblastomas a drug that 
inhibits a cell-surface protein 
called colony-stimulating 
factor-1 receptor. This 
protein is expressed mainly 
on the white blood cells, or 
macrophages, that surround 
the tumour. 

The 22 mice that did not 
receive the drug all died within 
8 weeks. By contrast, 64% 
of the mice given the drug 
were still alive after 26 weeks. 
Surprisingly, the drug did 
not kill the macrophages but 
instead altered their gene 
expression, presumably 
turning off tumour-promoting 
functions. 

Nature Med. 19, 1264-1272 
(2013) 


Shining light on 
cold turtles 


Freshwater turtles can survive 
the winter at the bottom of 
frozen lakes despite a complete 
lack of oxygen. But they do not, 
as some have suggested, fall 
into a coma when hibernating, 
according to Jesper Madsen of 
Aarhus University in Denmark 
and his colleagues. 

Madsen and his team 
submerged Trachemys scripta 
turtles (pictured) in cold, 
oxygen-depleted water to put 
them into false hibernation. 
The animals still responded 
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Why bee colonies collapse 


Environmental stresses can cause bee colonies 
to fail — even if the stress levels are not high 
enough to kill individual insects. 

Habitat decline, parasites and insecticides 
have all been blamed for bee colony collapses, 
but finding the individual causes of collapse has 
been problematic. 

John Bryden of Royal Holloway University of 
London and his colleagues modelled stresses on 
bees and found that colonies began to decline 
when the number of functionally impaired 


bees reaches a critical threshold. The model 
accurately predicted the fate of 16 experimental 
colonies of bumblebees (Bombus terrestris), of 
which half were exposed to a neonicotinoid 
pesticide at levels that do not kill bees but do 
reduce their ability to learn and gather food. 

Multiple stresses can put colonies on a knife 
edge between growth and failure, the authors say, 
which makes it hard to pin declines on one factor. 
Ecol. Lett. http://dx.doi.org/10.1111/ele.12188 
(2013) 


to light and increased 
temperatures, but not to 
vibrations or increased oxygen 
levels. The results suggest that 
hibernating turtles are ina 
low-energy but vigilant state. 
The brains of chemically 
anaesthetized turtles also 
responded to light, indicating 
that these animals have 
adapted to remain responsive 
to this stimulus even when 
other body systems are shut 
down. 
Biol. Lett. 9, 
20130602 
(2013) 


From synthesis to 
pill without pause 


A factory that produces 

a continuous stream of 

drug tablets from the raw 
ingredients could save time 
and money over traditional 
stop-start methods, which 
spread manufacture over many 
locations. Bernhardt Trout and 
his colleagues at the 
Massachusetts 
Institute of 
Technology 
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in Cambridge report the first 
example of such a plant, an 
18-square-metre factory that 
produces the hypertension 
drug aliskiren (developed by 
Novartis, which funded the 
project). 

Chemical building blocks 
flow in at one end, followed 
by a series of reactions and 
separations in which the drug 
is synthesized, crystallized, 
dried and coated to produce 
tablets at the other. 

Angew. Chem. Int. Edn 
http://dx.doi.org/10.1002/ 
anie.201305429 (2013) 
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ASTRONOMY 


Reflections from 
ablack hole 


Sagittarius A*, the unusually 
dim supermassive black hole 
at the heart of the Milky Way 
(pictured), may have flared 
up at times during the past few 
centuries. 

Observations from NASA's 
Chandra X-ray Observatory 
between 1999 and 2011 reveal 
bright patches of X-rays in 
molecular clouds in the centre 
of the Galaxy. The glow, which 
has been rippling outwards, 
could be reflected radiation 
that was emitted from the 
black hole long ago, and then 
bounced off nearby clouds. 

Maica Clavel of the Paris 
Diderot University and her 
colleagues calculate that two 
sudden surges in the black 
hole's activity — one lasting 
no more than two years, and 
the other about a decade — 
could produce the observed 
reflections. Charting them may 
help to reveal what the black 
hole has been consuming. 
Astron. Astrophys. 558, A32 
(2013) 


Two-faced 
cancer gene 


A genetic variant that greatly 
boosts the risk of testicular 
cancer may protect light- 
skinned individuals from skin 
cancer by helping them to tan. 
A team led by Douglas Bell 
at the US National Institutes 
of Health in Bethesda, 
Maryland, and Gareth Bond 
at the University of Oxford, 
UK, surveyed data from 


genome-wide association 
studies (GWAS). They focused 
on polymorphisms in DNA 
binding sites for the tumour- 
suppressor protein p53. One 
variant, in a gene called KITLG, 
has one of the strongest effects 
of any pro-cancer variant 
identified by GWAS and 

was vastly more common 

in caucasians. In mice, the 
p53-KITLG interaction 
boosted the growth of pigment- 
producing cells after exposure 
to ultraviolet radiation, and so 
might protect against excessive 
sun damage and cancer. 

Cell 155, 410-422 (2013) 


CLIMATE SCIENCE 


Ozone hole fans 
African heat 


Much of the summer warming 
in southern Africa over recent 
decades seems to have been 
due to the ozone hole over 
Antarctica. 

Desmond Manatsa, 
of Bindura University of 
Science in Zimbabwe, and 
colleagues compared regional 
climates before and after 
ozone depletion set in around 
1993. Pronounced surface 
warming strongly correlated 
with shifts in the strength 
and position of pressure 
systems in the atmosphere that 
enhance the southward flow 
of warm tropical air. These 
shifts are often attributed to 
ozone depletion in the upper 
atmosphere. 

The expected closure of the 
ozone hole by 2050 may help 
to mitigate climate warming in 
southern Africa, the authors 
conclude. 

Nature Geosci. 
http://doi.org/n8x (2013) 

For a longer story on this research, 
see go.nature.com/ph2nyo 


| NEUROSCIENCE 
How exercise 
boosts the brain 


A protein molecule secreted by 
muscles during exercise boosts 
the expression of factors that 
help to protect brain neurons. 
Endurance exercise is 
known to induce production 
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Steel mesh sucks in fog 


One square metre of high-tech mesh could 
capture 12 litres of potable water a day 
from morning fog. 

Fog-harvesters are used in countries 
such as Chile to collect drinking water. Tiny fog droplets in 
humid air blow through mesh filaments, where they coalesce 
into larger drops that roll into a collecting trough. 

To boost the amount of water captured, Robert Cohen 
and Gareth McKinley at the Massachusetts Institute of 
Technology in Cambridge and their colleagues systematically 
modelled and tested surface chemistries, fibre thicknesses and 
geometrical configurations of woven mesh. They produced a 
mesh of thin steel strands coated with a fluorinated polymer. 
In laboratory tests simulating Chilean mountain fog, the mesh 
collected water at rates fivefold higher than conventional 
fog-catching meshes and came close to the theoretical limit 
calculated by the researchers. 

Langmuir http://doi.org/n7n (2013) 
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of a protein called irisin and to 
improve cognitive performance 
in patients with some 
neurological conditions, but 
the mechanism linking these 
observations has been unclear. 
A team led by Michael 
Greenberg and Bruce 
Spiegelman at Harvard 
Medical School in Boston, 
Massachusetts, observed that 
irisin levels increased in a brain 
area associated with learning 
and memory after mice ran on 
exercise wheels. When levels 
of irisin or its progenitor were 
raised experimentally in the 
blood and in neurons, genes 
associated with learning and 
memory became active in the 
brain. 
Cell Metab. http://doi.org/n8z 
(2013) 


Jellyfish wake 
makes power 


When a jellyfish swims, its 
pulsing body gets an extra 
push from spinning water in 
its wake (pictured, red). 

Brad Gemmell at the Marine 
Biological Laboratory in 
Woods Hole, Massachusetts, 
and his colleagues studied how 


water flows around pulsing 
or paralysed moon jellyfish 
(Aurelia aurita). The animal’s 
bell-shaped body produces 
vortex rings of water as it 
pulsates. 

As the bell flattens and 
expands, one of the rings rolls 
inside the bell and sucks in 
more water. That pushes the 
jellyfish forward without any 
extra force from the creature’s 
muscles. This makes the simple 
predator one of the most 
energy-efficient swimmers on 
the planet, the authors say. 
Proc. Natl Acad. Sci USA 


http://doi.org/n7k (2013) 
For a longer story on this research, 
see go.nature.com/gbbmhv 
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US labs to close 


The US Department of Energy 
is planning to temporarily 
close some of its national 
laboratories, as the ongoing 
US government shutdown 
prevents the agency from 
paying the contractors that run 
the labs. Los Alamos National 
Laboratory in New Mexico 
will close on 18 October, and 
the nearby Sandia National 
Laboratories in Albuquerque 
will close on 21 October. Other 
facilities, including Pacific 


Northwest National Laboratory 


in Richland, Washington, have 
funds to remain open until 

at least early November. See 
go.nature.com/ehtlbi for more. 


Antarctic freeze 
The US National Science 
Foundation is recalling 

staff and scientists from 
Antarctica as a result of the 

US government shutdown, 

it announced on 8 October. 
Almost all science at the three 
US bases will cease, and only 

a minimal crew will stay to 
maintain facilities. Depending 
on the length of the shutdown, 
which began on 1 October, the 
move could effectively end this 
year’s fieldwork at McMurdo, 
Amundsen-Scott and Palmer 
stations. See go.nature.com/ 
w48czc for more. 


Mercury treaty 
More than 90 countries 
signed a treaty to limit 
mercury use and pollution at 
a United Nations conference 
in Kumamoto, Japan, on 

10 October. The Minamata 
Convention on Mercury 
seeks to curb emissions of the 
metal from power plants and 
other industrial facilities, and 
to limit its use in products 
from batteries and light bulbs 
to cosmetics and medical 
equipment (see go.nature.com/ 
vqch6y). The treaty will enter 
into force once it has been 


Warnings save lives in Indian cyclone 


India’s strongest cyclone in 14 years tore up 

crops and blew away buildings, but caused 

fewer casualties than feared owing to advance 
evacuations of at least 1 million people. Cyclone 
Phailin hit northeastern India on 12 October, 
with satellite estimates from the US Joint 
Typhoon Warning Center suggesting that, before 


it made landfall, storm winds blew at up to 260 
kilometres per hour: ona par with a 1999 cyclone 
that killed some 10,000 people in the region. 
Phailin weakened before it hit land; it caused 
hundreds of millions of dollars of damage, but, 

in contrast to the 1999 storm, only 27 deaths had 
been reported as Nature went to press. 


ratified by 50 countries, which 
is expected to take three to four 
years. 


Stem-cell trial off 
Italy’s health minister put 

an end toa planned clinical 
trial of a controversial stem- 
cell therapy on 10 October, 
which the government 

had previously agreed to 
finance with €3 million 
(US$4 million). In a stinging 
report, the scientific advisory 
committee appointed by 

the minister said that the 
clinical protocol proposed by 
the Brescia-based Stamina 
Foundation was inadequately 
described, lacked a scientific 
basis and was potentially 
dangerous. The therapy has 
divided Italian society for 
more than a year (see Nature 
495, 418-419; 499, 125; 2013). 
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Fracking studies 

The European Parliament 
proposed on 9 October to 
toughen regulations on the 
use of hydraulic fracturing 

for oil and gas exploration. 
The technique, also known as 
fracking, involves pumping 

a slurry of water, sand and 
chemicals underground to 
fracture shale formations and 
release hydrocarbons. Under 
the proposed rule, which 

was adopted by 332 votes to 
311, companies seeking to 
exploit shale formations would 
first be required to conduct 
environmental-impact studies. 


E-cigarette vote 

On 8 October, the European 
Parliament voted to tighten 
regulations on tobacco 
products across the European 
Union (EU), but opted to 


scale back proposed rules 

for electronic cigarettes. The 
European Commission and 
the European Council, which 
represents the governments 
of the 28 EU member states, 
have pushed for regulation 

of e-cigarettes as medical 
devices. The parliament 
voted to treat them as tobacco 
products unless they are 
marketed with health claims. 
Europe's legislative bodies 
must now negotiate and agree 
on the final legislation. See 
go.nature.com/fpk7ra for 
more. 


Malaria trial 
GlaxoSmithKline (GSK) will 
apply next year for European 
regulatory approval of the 
candidate malaria vaccine 


BISWARANJAN ROUT/AP 


& RTS,S/ASO1, the London- 

& based pharmaceutical giant 
announced on 8 October. 
GSK and the PATH 

Malaria Vaccine Initiative, 

a global programme that is 
co-developing the treatment, 
released 18-month follow-up 
data from a phase III clinical 
trial of children in Africa, 
which largely reinforced 
results reported at 12 months 
(see go.nature.com/2bgpl8). 
The treatment offered only 
modest protection for most 
children. It showed especially 
weak results in babies treated 
at 6-12 weeks of age — the 
vaccine'’s target group (see 
go.nature.com/gmw9ib). See 
page 271 for more. 


Badger beef 


A UK experiment to 

control the spread of bovine 
tuberculosis by culling 
badgers (Meles meles) is 
causing political strife. 
After markedly fewer 
badgers were killed than 
originally mandated for the 
six-week pilot cull, the UK 
environment department 
said that it would extend the 
experiment. Environment 
minister Owen Patterson 
said that there were fewer 
badgers than thought in the 
pilot regions, so the cull had 
been successful. Asked last 
week if he was “moving the 
goalposts” to make this claim, 
he replied that “the badgers 
have moved the goalposts”. 


SERGEY MAMONTOV/RIA N 


TREND WATCH 


SOURCE: EXOPLANET.EU 


NASAs Kepler mission has 
identified thousands more 


candidate exoplanets that have 


yet to be confirmed. 


The number of planets detected 
outside the Solar System was 
expected to surpass 1,000 this 
week, according to the Extrasolar 
Planets Encyclopaedia, which 

is run by Jean Schneider of the 
Paris Observatory. But Schneider 
notes that the precise count is 
complicated by the lack of any 
consensus on what defines a 
planet, and uncertainty about 
some remote measurements. 


PEOPLE 
New space chief 


Russian space agency chief 
Vladimir Popovkin was 
replaced on 10 October 
following a series of failed 
launches. In July, a Russian 
Proton-M rocket carrying 
navigation satellites crashed 
seconds after launching. 
Former deputy defence 
minister Oleg Ostapenko 
(pictured) was named to take 
over the federal space agency 
Roscosmos. See go.nature. 
com/vwuqun for more. 


Chemistry Nobel 
This year’s Nobel Prize 

in Chemistry was won by 
computational biologist 
Michael Levitt at Stanford 
University School of 
Medicine, California, 
together with chemists 


Martin Karplus at the 
University of Strasbourg 
in France and Harvard 
University in Cambridge, 
Massachusetts, and Arieh 
Warshel at the University 
of Southern California, 
Los Angeles, for their work 
on computer modelling of 
chemical interactions. See 
page 280 for more. 


Nobel Peace Prize 
The Organisation for the 
Prohibition of Chemical 
Weapons was awarded the 
2013 Nobel Peace Prize on 
11 October. The international 
body, which is based in The 
Hague, the Netherlands, was 
recognized for its extensive 
efforts to eliminate chemical 
weapons, which include 
overseeing the current 
process to destroy Syria's 
arsenal. See go.nature. 
com/3lf4kv for more. 


| BUSINESS 
Carbon-capture fall 


Plans for large-scale 
projects to capture and store 
carbon dioxide emissions 
are declining, according 

to the Global Carbon 
Capture and Storage (CCS) 
Institute, a non-profit CCS- 
supporting company based 
in Melbourne, Australia. In 
its annual report, published 
on 10 October, the institute 
counted 65 planned and 
ongoing projects, down from 


EXOPLANET CATALOGUE NEARS 1,000 


Most recent exoplanets were found by NASA's Kepler mission, 
which detected dips in starlight caused by a transiting planet. 
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SEVEN DAYS | THIS WEEK | 


21-27 OCTOBER 
Improved assessment 
tools and science- 
based management 

of marine ecosystems 
are discussed at the 
3rd International 
Marine Protected Areas 
Congress in Marseilles, 
France. 
go.nature.com/hrtazt 


22-25 OCTOBER 
Sao Paolo, Brazil, hosts 
aconference celebrating 
15 years of the SciELO 
Network, an open- 
access platform for 
scientific publishing. 
Topics include trends 
in open access, 

metrics for journal 
quality and research- 
communication policy. 
go.nature.com/tw2tkd 


75 in 2012 — a decrease it 
attributes in part to policy 
uncertainty. None of the 

12 projects now in operation 
is at a power plant, but 

one large coal plant in 
Saskatchewan, Canada, is 
scheduled to start capturing 
carbon in April. 


Lithium concerns 


The US Government 
Accountability Office in 
Washington DC has warned 
of a looming shortage of 
lithium-7, a radioactive 
isotope used to maintain safe 
cooling at more than half of 
the 100 nuclear power plants 
in the United States. On 

9 October, the agency said 
that previous government 
assessments underestimated 
the country’s demand for 

the isotope and overlooked 
uncertainties about future 
supplies. The isotope has not 
been produced in the United 
States since 1963, and can be 
obtained only from Russia or 
China. 


> NATURE.COM 
For daily news updates see: 
www.nature.com/news 
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BILL BRANSON/NIAMS/NIH 


Social scientists 
apply to study the inner 
workings of the IPCC p.281 


ITER experiment to go all out 
for power p.282 


Delays force 


Researchers 
could soon get their hands 
on UK patient records p.283 


Nature looks at ways 
of tracking the influence 
of research p.287 


GOVERNMENT 


A skeleton staff at the US National Institutes of Health has struggled to keep experiments afloat. 


b 
BL he ~ yA & 


NIH campus endures 
slow decay 


Experiments suffer from lack of lab materials and staff 
during US government shutdown. 


BY SARA REARDON 


leak grey skies mirror the mood of the 
skeleton staff trickling through the gates 


of the main US National Institutes of 
Health campus in Bethesda, Maryland. Most 
of the principal investigators are absent: with- 
out students to advise or meetings to attend, 
there is little point in being there. Perhaps one 


out of every ten windows is lit up, revealing 
lonely postdocs working on what few experi- 
ments they are allowed to maintain as the US 
government shutdown drags on. 

On 1 October, after federal budget nego- 
tiations reached an impasse and forced 
the shutdown, the NIH sent 73% of its 
18,646 employees home. During the second 
week of the shutdown, the US Department 


of Health and Human Services put nearly 
1,000 more on unpaid furlough, or enforced 
leave. As Nature went to press, there were sug- 
gestions that the Republican-controlled US 
House of Representatives could come to a deal 
with the presidential administration and the 
Democratic-controlled Senate, which could 
reopen the government. But during a visit to 
the NIH on 9 October, Nature found remaining 
staff members grimly working to keep crucial 
research efforts afloat. Notably, 1,437 clinical 
studies are continuing and a few trials have 
been able to enrol a handful of desperately ill 
patients. Technicians at animal facilities have 
stayed on, ensuring that the NIH’s 1.4 mil- 
lion rodents and 3,900 non-human primates 
receive care. And several hundred employees 
are allowed to maintain irreplaceable cell lines. 

Yet researchers are still finding themselves 
severely hobbled. One of the worst problems, 
some say, is the ban on ordering necessary 
lab materials such as enzymes and chemi- 
cals for culturing cells. “We can hold out for 
maybe a couple weeks with what we have, then 
were in real trouble,” says one lab head from 
the National Institute of Allergy and Infec- 
tious Diseases (NIAID). Like all of the NIH 
employees who spoke to Nature, he asked to 
remain anonymous because he is not author- 
ized to talk to the media. Many experiments 
are being frozen — in some cases literally — as 
labs decide which can continue, which must be 
put on hold and which have to be abandoned. 
“Tf this goes on, whole experiments will begin 
to crumble,” says the NIAID researcher. 

With confusion reigning, the shutdown is 
playing out in different ways across the NIH’s 
27 institutes and centres. At the NIAID, for 
instance, lab heads have been instructed that 
they cannot have more than two people ina lab 
at any given time. Some institutes are allowing 
lab heads to recall workers as needed, whereas 
others have issued no clear directives. A few 
postdocs are ignoring the furlough, saying that 
they have had no specific orders to leave cam- 
pus. One says that he and his colleagues have 
not been bothered yet, but he worries about 
being revealed if an accident occurs in the lab. 

Lab heads, some of whom are themselves 
barred from campus, say that they have been 
told to write out precisely what each employee 
should be doing each day, and to justify each 
project. Mike Askenase, a graduate student at 
the University of Pennsylvania in Philadelphia 
who does research at the NIAID, is allowed 
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> to work for just eight hours a week and 
only on experiments “that would cost more 
to shut down than to continue”. He says 
his lab studies “mouse plague”: Yersinia 
pseudotuberculosis, which in mice causes 
cysts and gut problems over the course of 
two to three months. The disease progres- 
sion cannot be rescheduled, he says, and 
most researchers did not count on a shut- 
down when they started their experiments 
months ago. With just two people at a time 
in the lab, some parts of the experiments 
may go unfinished. 

Researchers working on animals are 
among the most worried. One postdoc 
from the National Cancer Institute says 
that her security access was revoked at first, 
but after her adviser pleaded her case, she 
was given permission to enter her building 
for one hour per day to advise the techni- 
cians who are caring for her mice. The 
rodents were injected with cancerous cells 
several months ago, she says, and some of 
their tumours have now grown so large that 
the animals need to be killed. She is grate- 
ful for that one hour, she says, because it 
allows her to direct the technicians to take 
tissue biopsies so that she will be able to 
pick her experiment back up once the 
shutdown ends. 

Other animal researchers say that the 
shutdown is affecting their projects in more 
unpredictable ways. One NIH scientist who 
works with primates says that it is keeping 
him from retrieving samples from the pri- 
mate facilities in Poolesville, Maryland, 
40 kilometres from Bethesda. Regulations 
require that animal-tissue specimens be 
transported in a government car, but the 
shutdown has kept government vehicles 
out of use. And if he cannot do his work, 
which involves human therapeutics, the 
researcher questions the morality of keep- 
ing the primates. “I don't think it’s ethical to 
have an animal in a cage if we're not doing 
experiments on it,’ he says. 

There is one place on campus that still 
seems to be doing good business: the small 
cafeteria in the NIH’s Clinical Center. Nor- 
mally used by patients, it is now the only 
place open to eat. The buzz of conversa- 
tion there seems muffled. Discussions of 
science are overshadowed by doubt, worry 
and uncertainty. m 
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NOBEL PRIZE 


Modellers react to 
chemistry award 


Prize proves that theorists can measure up to experimenters. 


BY RICHARD VAN NOORDEN 


( "scent modelling is one of the many 
scientific fields that Alfred Nobel, 
understandably, failed to anticipate in 

his 1895 will. And so, as Michael Levitt points 

out, “there’s no Nobel prize for computer 
science”. But computation’s increasing impor- 
tance in chemistry and biology was recognized 
last week, when Levitt, of Stanford University in 

California, was one of three scientists to receive 

the chemistry Nobel for their work on ways to 

simulate the activity of large molecules — from 
cellular enzymes to light-absorbing dyes. 

“Computers in biology have not been 
sufficiently appre- 
ciated,’ Levitt said 
at a press confer- 
ence, joking that a 
fourth portion of 
the Nobel might have gone to the chip manu- 
facturers, who have driven up computing 
power exponentially. 

Together with Martin Karplus of the Uni- 
versity of Strasbourg in France and Harvard 
University in Cambridge, Massachusetts, and 
Arieh Warshel of the University of Southern 
California in Los Angeles, Levitt was honoured 
for a specific modelling technique: working out 
how to stitch together descriptions of molecules 
at close-up and zoomed-out scales. 

The three were trailblazers in the 1970s. 
At the time, finely detailed quantum- 
mechanical pictures of bond making and 
breaking could not be calculated for more 
than a cluster of atoms — even today they 
are too complex to be computable beyond a 
few hundred atoms, and cannot be used to 
model whole proteins. So Levitt, Warshel and 
Karplus worked out how to merge these mod- 
els with simplified simulations that treat mol- 
ecules as non-reacting, vibrating atomic balls 


“We are good 
at guiding 
experimentalists.” 


connected by springs. “The art is to find an 
approximation simple enough to be comput- 
able, but not so simple that you lose the useful 
detail? Levitt says. 

These multi-scale models have proved 
essential for studying the workings of enzyme 
reactions, and were pioneered in a 1976 paper 
in which Warshel and Levitt explained how 
lysozyme cleaves a glycosidic bond. Multi- 
scale techniques are not widely used in the drug 
industry, adds Kenneth Merz, who heads the 
Institute for Cyber-Enabled Research at Michi- 
gan State University in East Lansing. Instead, 
says theorist Christopher Cramer of the Uni- 
versity of Minnesota in Minneapolis, they find 
uses in, for example, revealing how industrial 
catalysts work, or examining how light activates 
dyes on semiconducting nanoparticles. 

The award is also being viewed as an 
acknowledgement of the three scientists’ life- 
time work in molecular simulation, research- 
ers told Nature. “They have made theory an 
equal partner to experiment,’ said theoretical 
chemist Gunnar Karlstr6m of Lund University 
in Sweden, a member of the Nobel committee. 

Still, a question mark remains over whether 
theorists can make predictions that surprise 
experimenters. Computer modelling “is really 
good at helping people understand why things 
work the way they do, but not so good at pre- 
dicting new things. We are good at guiding 
experimentalists,’ says Ken Houk, who uses 
computer programs to design new enzymes at 
the University of California, Los Angeles. 

Experimenters should be cautious of simu- 
lation results, agrees Warshel. But “one day 
everything will be done by powerful comput- 
ers’, he predicts. 

Cramer adds: “Every year, hazardous-waste 
disposal gets more expensive, whereas com- 
puting power gets cheaper. So the progress 
curves favour the theoreticians.” = 


| MORE NEWS | 
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IPCC head Rajendra Pachauri (left) could be observed by sociologists studying meeting interactions. 


CLIMATE POLICY 


Study aims to put 
IPCC under a lens 


Social scientists want to examine how climate panel’s 
internal dynamics affect outcomes. 


BY JEFF TOLLEFSON 


very six years or so, the Intergovernmental 
Be on Climate Change (IPCC) reports 

on everything that is known about global 
warming. In between, at meetings behind 
closed doors, hundreds of scientists review stud- 
ies, discuss the latest model results and assess 
humanity’s options. The eventual result is a 
series of reports, such as the scientific analysis 
released last month — the first component in 
the fifth assessment on climate change, which 
will also cover adaptation and mitigation. But 
how, exactly, do these reports get written? 

This week, at its latest plenary session in the 
Black Sea city of Batumi, Georgia, the IPCC will 
consider a request from social scientists who, 
armed with recorders, want to study the inter- 
actions that take place within the panel's many 
meetings. The proposal, part of a larger “assess- 
ment of assessments” funded by the US National 
Science Foundation, could offer insight into the 
ways that social dynamics, unconscious biases 
and seemingly mundane rules affect the final 
product — and what might be done to improve 
the process. 

“Most previous studies have looked at how 
the IPCC interacts with the outside world, but 
were interested in how it interacts with itself,’ 
says Michael Oppenheimer, a climate scientist 


at Princeton University in New Jersey, who 
recognizes his own biases as a participant in the 
past four climate assessments. “The truth is that 
very little is known about the actual process.” 
Sociologists of science have long been inter- 
ested in the way that social dynamics subtly 
tug at the scientific process, and the IPCC, a 
prominent target because of its international 
stature and political significance, has been 
studied before. In a pair of recent papers 
(K. Brysse et al. Global Environ. Change 23, 
327-337; 2013, and J. O'Reilly et al. Soc. 
Stud. Sci. 42, 709-731; 2012), which cen- 
tred on the treatment of rising sea levels in 
the 2007 assessment, Oppenheimer and 
his colleagues argued that the IPCC tends 
towards caution and errs “on the side of least 
drama”. Their results were based on IPCC 
documents and subsequent interviews, but 
the researchers did not have access to the 
meetings in which the issue was thrashed out. 
In the studies, the researchers focused on the 
IPCC’s decision to exclude melting of the West 
Antarctic Ice Sheet from its review, because 
of uncertainty in the models used to predict 
the ice sheet’s behaviour. As a result, the final 
assessment projected a sea-level rise of up to 
59 centimetres by 2100, even though new 
studies had suggested that an acceleration 
in ice-sheet melting could raise ocean levels 
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substantially higher. Oppenheimer and his 
colleagues found that the exclusion decision was 
driven by a few key scientists and a requirement 
for consensus, and also by an assessment struc- 
ture that split the treatment of sea ice into three 
chapters covering the past, present and future. 
This complicated the assessment process by 
involving more scientists and overemphasized 
uncertainty, the researchers argued. 

The social scientists now want to go deeper 
and get inside the IPCC meetings as they 
happen. That would allow the team to observe 
scientists’ interactions and to interview pivotal 
actors on the spot, rather than trying to recon- 
struct events. Treating scientists as subjects, 
however, raises privacy issues, given that the 
deliberations are confidential, a policy that is 
intended to allow scientists to talk openly without 
fear of having off-the-cuff remarks broadcast to 
the world. The social scientists also recognize that 
the mere act of observation could affect the pro- 
cess, although they say that their protocols would 
be designed to minimize intrusion and maintain 
confidentiality. 

Such fears led the IPCC to deny the team’s 
first request for access in 2010. It came at a 
particularly sensitive time, when the panel was 
reviewing its procedures after an embarrass- 
ing mistake (regarding Himalayan glaciers) 
in the previous assessment, and following the 
‘Climategate’ controversy, sparked by the leak 
of thousands of private e-mails from the Uni- 
versity of East Anglia, UK, revealing private 
discussions between leading climate scientists. 

Procedure was also an issue: the request 
came after the IPCC had enlisted scientists and 
laid out the ground rules for the fifth assess- 
ment. “It was perceived by some asa change of 
the rules in the middle of the game,’ says IPCC 
vice-chair Jean-Pascal van Ypersele, a clima- 
tologist at the Catholic University of Leuven 
in Belgium. 

Ypersele suspects that the proposal will be 
received more warmly this time around by 

member govern- 


“Most people on ments, because it 
the outsidehave could be folded into 
noideawhatthe — theIPCC assessment 
IPCC does.” process from the out- 


set. This week, the 
IPCC is likely to either approve the proposal, 
or delay action until after the fifth assessment 
is completed next year, he says. 

Naomi Oreskes, a science historian at Har- 
vard University in Cambridge, Massachusetts, 
and a member of the prospective study team, 
argues that allowing ethnographers inside 
the process will promote transparency and 
enhance the panel's credibility with the pub- 
lic and policy-makers. “Most people on the 
outside have no idea what the IPCC does,’ she 
says, and the black box inevitably generates 
suspicion and fuels criticism among climate 
sceptics. Clarifying the process might make the 
IPCC’s assessments seem a little less like magic 
— anda little more like sausage-making. m 
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NUCLEAR FUSION 


A FUSION OF IDEAS 


ITER’s reactor is a tokamak, in which the fuel 

is contained in a doughnut-shaped vessel and heated 
to ten times the temperature of the Sun’s core, 
forming a plasma, a hot, electrically charged gas. 


1. VACUUM VESSEL 
A huge stainless steel container will hold the 
plasma and house the fusion reaction. 


2. HEATING 

Neutral beam injections and radio-frequency 
electromagnetic waves will heat the plasma to 
150,000,000 °C. 


3. MAGNETS 

Ten thousand tonnes of superconducting magnets 
generating a field 200,000 times that of Earth's 
magnetic field will confine and shape the plasma. 


4. BLANKET 

Tiles weighing up to 4 tonnes will protect the 
vacuum vessel and magnets from heat and 
neutrons. 


5. DIVERTOR 

A series of tungsten tiles under the vaccum 
vessel take exhaust heat and gases away 
from the tokamak. 


6. DIAGNOSTICS 

Key experimental tools (including pressure gauges 
and neutron cameras) for measuring the physics of 
plasmas. 


7. CRYOSTAT 

A huge refrigerator surrounding the vacuum vessel, 
protecting the superconducting magnets and other 
equipment from heat. 


ITER Keeps eye on prize 


Construction delays force rethink of research programme, but fusion target still on track. 


BY DECLAN BUTLER 


elays in the installation of key parts 
D of ITER, a multibillion-euro interna- 
tional nuclear-fusion experiment, are 
forcing scientists to change ITER’s research 
programme to focus exclusively on the key 
goal of generating power by 2028. Asa result, 
much research considered non-essential to the 
target, including some basic physics and stud- 
ies of plasmas aimed at better understanding 
industrial-scale fusion, will be postponed. 
Nature has learned that the plans form 
the main thrust of recommendations by a 
21-strong expert panel of international plasma 
scientists and ITER staff, convened to reassess 
the project’s research plan in the light of the 
construction delays. The plans were discussed 
this week at a meeting of ITER’s Science and 
Technology Advisory Committee (STAC). 
The meeting is the start of a year-long review 
by ITER to try to keep the experiment on track 
to generate 500 MW of power from an input of 
50 MW by 2028, and so hit its target of attain- 
ing the so-called Q= 10, where power output 
is ten times input or more. 


ITER, which will be the world’s largest 
tokamak thermonuclear reactor (see ‘A fusion of 
ideas’), is being built in St-Paul-lez-Durance in 
southern France by the European Union, China, 
India, Japan, South Korea, Russia and the United 
States at a cost of €15 billion (US$20.3 billion). 
Q2 10 is seenas its raison d’étre, and achieving it 
would be likely to revitalize public and political 
interest in fusion. Crucial to that is getting to the 
point, scheduled for 2027, when the first nuclear 
fuel would be injected into the reactor. The fuel 
will bea plasma of two heavy hydrogen isotopes, 
deuterium and tritium (DT). 

The original 2010 research plan foresaw the 
entire reactor being built by 2020, when ITER 
was also scheduled to produce its first plasma, 
using hydrogen as a test fuel. But cost-cutting 
and cash-flow problems in member states 
mean that while the reactor is likely to be 
operating by then, the delivery of some parts is 
being deferred until several years later. These 
include some diagnostics devices for analys- 
ing the physics of plasmas at the very large 
scales of ITER, and elements of the heating 
system that will eventually take the plasmas 
to 150,000,000 °C. 
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“The plan was that everything would be 
procured and installed before first plasma, 
and then we would go straight into opera- 
tion with a full set of systems,’ says David 
Campbell, head of ITER’s plasma directo- 
rate. Instead, researchers will start with an 
initial set of instruments and systems, with 
others added later as upgrades. One of the 
main aims of the STAC meeting was for ITER 
to learn what elements of the research pro- 
gramme were essential to keeping it on track 
to reach DT phase and Q = 10 on schedule. 
A local plant that will produce tritium, for 
example, is one key element. 

The outcome of the review is also expected 
to influence ITER member states’ deferral 
plans, which will be modified to meet the key 
scientific priorities identified in the review. By 
fixing a timetable, Campbell says, STAC “will 
match up delivery schedules to the research 
plan, so that the research plan is not waiting 
for stuff to be delivered”. 

The likely consequence of capping costs is 
that some parts of the research plan will be 
postponed until after 2028. ITER initially aims 
to produce a Q> 10 for a few seconds, and then 


ITER ORGANIZATION 


for pulses of 300-500 seconds, and work up 
over the following decade to output ratios of 
30 times more power out than in, with pulses 
lasting almost an hour. Eventually the aim is 
to develop steady-state plasmas, which will 
yield information relevant to industrial-scale 
fusion-power generation. It is experiments 
relating to the understanding of longer-pulse 
and steady-state ITER plasmas that are most 
likely to be delayed beyond 2028. 

Research into better plasma performance, 
and with it greater energy output, may 
also be held back, along with experiments 


investigating how to control turbulence, which 
can damage the reactor wall, and the stability 
and energy characteristics of plasmas. 

Olivier Sauter at the Swiss Federal Institute 
of Technology in Lausanne, Switzerland, one 
of the reviewers of ITER’s research plan, says 
that months or more might be cut from the 
time needed to reach DT. But ITER’s decision 
to take shortcuts also carries risks, he adds. To 
help mitigate these, ITER is working closely 
with researchers at other tokamaks around 
the world, such as the Joint European Torus 
in Oxfordshire, UK, to address some of the 
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uncertainties likely to be encountered in 
plasma energies and stability. 

“Tt is somewhat unfortunate that the com- 
pression of the ITER schedule will limit inter- 
esting research opportunities during the early 
stages of ITER operation, but the mission of 
ITER is clear,’ says Mickey Wade, director of 
the US national DIII-D fusion programme at 
General Atomics in San Diego, and a member 
of the review panel advising STAC. “The ITER 
physics team has done an admirable job of 
maintaining a single-minded focus on obtain- 
ing Q= 10 operation as early as possible.” m 
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HEALTH POLICY 


UK push to open up 
patients’ data 


Government faces obstacles to mining medical records. 


BY EWEN CALLAWAY 


doctor's practices across England, urging 

patients to say yes to their medical records 
being used for scientific research — or, more 
precisely, not to say no. 

The move, now gathering momentum, is 
part of a campaign by the UK government, 
alongside major research funders such as 
the Wellcome Trust in London, to convince 
a sceptical public to share their health details 
with researchers, through a system in which 
patients must expressly opt out. Privacy advo- 
cates are encouraging them to do just that. 

The government's plans are part of a shake- 
up of health data in the National Health Ser- 
vice (NHS) in England, the world’s largest 
public-health system, that cares for about 
53 million people. Following reforms made 
in April, it will in the coming weeks begin 
radically changing the way it handles patients’ 
records. This will involve establishing a central 
repository to connect hitherto disparate elec- 
tronic data from general practitioners’ (GP) 
practices, hospitals and disease registries. 

Such linkage, already in place in Scotland and 
Wales, where the NHS is run separately, will 
deliver better health care, the government says, 
while establishing the world’s most comprehen- 
sive patient database for research. It could be 
used to find new uses for existing drugs, and 
speed up the transfer of research to the clinic. 

“The potential crown jewels in the UK are 
primary-care data that have been electronic for 
decades and have been coded for decades and 
have wide population coverage, nearly 100%,” 
says Harry Hemingway, director of the Farr 


IE August, posters began appearing in 


Health Informatics Research Institute at Uni- 
versity College London, which was established 
this year with funding from the UK Medical 
Research Council to mine such records. Such an 
archive would trump those in the United States, 
and even in Denmark and Sweden, which have 
had central health databases for years. 

The immediate use of the linked data will 
be to help the NHS apportion resources, but 
the government is also keen to make patients’ 
records more useful — and accessible — to 
researchers in academia and industry. Prime 
Minister David Cameron has said that every 
NHS patient should be a research participant. 
His administration is also hoping that access to 
patients’ data will lure drug companies back to 
Britain, and catalyse a health-informatics indus- 
try potentially worth billions of pounds. 


A QUESTION OF CONSENT 


In a survey, 1,396 UK adults were asked: ‘How 
willing or unwilling would you be to take part ina 
medical research project which involved allowing 
access to your personal health information 
(medical records), on an anonymous basis. 


Don't know 4% 


Very willing 21% 


Very 
unwilling 
22% 


OVERALL, 
60% OF THOSE © 
SURVEYED WERE | 
WILLING 


Fairly Fairly 
unwilling willing 
14% 39% 


This autumn, GPs’ records will begin migra- 
tion to a data centre, where they will be linked 
with other data, including already-central- 
ized hospital records. Some of this informa- 
tion — stripped of identifying details or fully 
anonymized — will also be made available to 
approved researchers through a secure portal. 

According to some proponents of the plan, 
patients have little reason to opt out. “People 
think their records are being shared much 
more than they already are,’ says Nicola Per- 
rin, head of policy at the Wellcome Trust, the 
UK’s biggest funder of biomedical research. 
She worries that the public in England have not 
been adequately informed about the benefits 
of records sharing, such as improved health 
care, nor about measures intended to protect 
privacy. “I think there is underlying support 
for it, provided one can explain that there are 
safeguards, and that it isn’t your most personal 
secrets that researchers want to get,” she adds 
(see ‘A question of consent’). 

Yet research funders worry that scaremon- 
gering in sections of the press could lead to 
large numbers of people opting out of the 
scheme, diluting its usefulness to research- 
ers. In response, funders plan to become more 
vocal in touting the benefits of health-records 
research, such as very large epidemiological 
studies showing the effectiveness of smoking 
bans and the safety of vaccines. “When you 
explain that all of this research is only possible 
by using patient records, then people change 
their minds,” says Janet Valentine, head of pub- 
lic health and ageing at the Medical Research 
Council, the UK’s publicly funded agency for 
biomedical research, which spent £760 million 
(US$1.2 billion) on research last year. 

Phil Booth, head of a campaign called 
medConfidential that opposes the changes, 
worries that medical research is being used as 
a patient-friendly cover to collect data for other 
uses, such as the administration of social-secu- 
rity benefits. If privacy were compromised — an 
inevitability, Booth says — patients might lose 
faith in research and the NHS. His organiza- 
tion successfully fought for patients to be able 
to opt out. “I think research institutions are 
basically being rather short-sighted in aligning 
themselves with this initiative,” he says. m 
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REPRODUCTIVE BIOLOGY 


Regulators weigh benefits of 


‘three-parent’ fertilization 


But critics say mitochondrial replacement carries safety and ethical concerns. 


BY ERIKA CHECK HAYDEN 


egulators in the United States are 
R considering whether to permit trials of 

a controversial assisted-reproduction 
technique intended to help women to avoid 
passing certain genetic defects on to their 
children. 

On 22 October, the US Food and Drug 
Administration (FDA) is scheduled to meet in 
Silver Spring, Maryland, to discuss a method 
that could prevent transmission of defects in 
mitochondria — cellular components that 
contain a small amount of DNA — from 
mother to child. The defects, which can cause 
fatal developmental conditions, affect as many 
as 4,000 US births a year. 

The technique places nuclear DNA from the 
egg of a woman with a mitochondrial defect 
into a donated egg that has had its nuclear 
DNA removed, but contains healthy mito- 
chondrial DNA. Once the egg is fertilized, the 
resulting embryo would, ina sense, have three 
parents, because the donor mitochondrial 
DNA is passed down along with the mother 
and father’s nuclear DNA. 

The FDA was asked to look into the issue by 
developmental biologist Shoukhrat Mitalipov 
at Oregon Health and Science University in 
Beaverton, who last year created early human 
embryos with the technique (see Nature http:// 
doi.org/n76; 2012). When the manipulated 
eggs were fertilized, genetic abnormalities 
were detected in half of them — but seem- 
ingly normal embryonic stem-cell lines could 
be extracted from 38% of the rest. Trying to 
obtain stem cells from unmanipulated eggs 
results in a similar success rate. Mitalipov 
had used the same technique in 2009 to create 
apparently healthy rhesus monkeys. Now he 
wants to begin a clinical trial in humans. 

In 2001, the FDA began to regulate the 
technique as a form of gene therapy after 
researchers used fresh mitochondria in a 
handful of infertile women to help them to 
conceive (see ‘Energizing eggs’). The regula- 
tion was widely, but incorrectly, reported as a 
ban. The FDA asked researchers to apply for 
permission to test the approach in clinical 
trials. But none did — until now. At the time, 
the agency said that the safety data “were not 
convincing’, citing examples of genetic abnor- 
malities such as a missing X chromosome 


ina fetus created with the technique. 

The anomalies seen in embryos created with 
mitochondrial transfer could have been due to 
the mothers’ underlying fertility issues rather 
than to the technique itself, says embryologist 
Jacques Cohen, who was scientific director of 
assisted reproduction at Saint Barnabas Medical 
Center in Livingston, New Jersey, when such 
treatments were conducted there. 

But other safety concerns have been raised 
since then. In September, a group of evolu- 
tionary biologists led by Klaus Reinhardt at 
the University of Tiibingen in Germany, said 
that problems could arise if mitochondrial 
and nuclear DNA from different women 
proved to be incompatible. They pointed to 
dozens of experiments in mice, fruit flies and 
other animals in which mixing nuclear and 
mitochondrial DNA from individuals with 
different genetic backgrounds sometimes led 
to reduced growth, early death, fast ageing or 
reduced reproductive ability. 

Mitalipov and other 


> NATURE.COM scientists counter that 
For more on those experiments were 
mitochondrial mostly done by mixing 
replacement, see: strains of inbred ani- 
go.nature.com/xhdedw © mals. In species such as 
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Mitochondria (green) in egg cells carry an independent lineage of DNA that can pass on genetic defects. 


humans, individuals from different genetic 
backgrounds interbreed freely without ill effects. 
“Tf anything, children born from mixed-race 
couples, and [their] successive generations, are 
fitter than those from same-race couples,” says 
developmental geneticist Robin Lovell-Badge 
of the Medical Research Council National 
Institute for Medical Research in London. 

Paul Knoepfler, a stem-cell biologist at the 
University of California, Davis, has a different 
concern: epigenetics. He says that the donor 
egg’s cytoplasm could reprogram chemi- 
cal tags on the nuclear DNA which alter the 
expression of genes. But Mitalipov argues 
that reprogramming will not occur with his 
technique because he is transferring genetic 
material between cells that are in exactly the 
same developmental state. He points to the 
existence of the healthy monkeys that are 
now more than four years old — and are the 
product of mitochondrial transplants across 
different genetic backgrounds — as evidence 
that the technique is safe. 

In March, the UK Human Fertilisation and 
Embryology Authority (HFEA) concluded 
that human trials could be done if, for instance, 
offspring were monitored long-term. The UK 
government is now drawing up regulations 
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ENERGIZING EGGS 


Experimental fertility treatment faces scrutiny 


As the US Food and Drug Administration 
(FDA) debates the merits of mitochondrial 
replacement in eggs, some observers will 
be looking for hints as to how the agency 
may regulate another mitochondrial 
manipulation — one with fewer ethical and 
safety concerns. 

OvaScience, a biotechnology company in 
Cambridge, Massachusetts, wants to boost 
the success rate of in vitro fertilization (IVF) 
by infusing eggs with fresh mitochondria. 
The mitochondria are harvested from an 
IVF patient’s own egg precursor cells, a 
cell type discovered by Jonathan Tilly, a 
reproductive biologist at Northeastern 
University in Boston, Massachusetts. Tilly 
says that these precursor cells can be 
coaxed to develop into mature eggs in adult 
women, challenging the dogma that women 
are born with all the eggs they will ever have. 
Tilly’s results are disputed (see Nature 491, 
318-320; 2012), but OvaScience has long- 
term plans to harvest precursor cells and 
use them to create fresh eggs for women for 
whom conventional IVF has failed. 

The company’s first project, called 
AUGMENT, is to harvest precursor cells, 
isolate their mitochondria, and inject 


EGG REGS 


OvaScience shares plummeted on 10 September, the 
day the company revealed that the US Food and Drug 
Administration might regulate its fertility treatment. 
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on the technique, and Parliament, which had 
banned all germline modifications, will vote 
on whether to allow the procedure next year. 
There are also ethical considerations. The 
HFEA said that the procedure should be con- 
sidered in the same ways as a tissue donation, 
and that any resulting child should not have 
the right to know the identity of the donor of 
the healthy mitochondria. The FDA, unlike 
the HFEA, does not consider ethics, and that 
worries Marcy Darnovsky, executive direc- 
tor of the Center for Genetics and Society, an 
advocacy group in Berkeley, California. Her 
group has opposed such trials, in part because 
of concerns that acceptance of the technology 
might lead to the selection of embryos with 


them into mature eggs to see if they can 
revive eggs from infertile women, as work 
with mitochondria from donor eggs has 
suggested. Mitochondrial DNA from egg- 
precursor cells is thought to contain fewer 
mutations than mitochondrial DNA in the 
eggs themselves. Because OvaScience would 
be using mitochondria from a patient’s own 
cells, the company hopes to sidestep ethical 
concerns raised by ‘three-parent’ embryos. 

OvaScience has argued that AUGMENT 
involves ‘minimal manipulation’ — the 
same injection procedure, for example, is 
already used to put sperm into an egg — 
and therefore would not need FDA approval 
to be deployed in clinics. Regenerative 
Sciences in Broomfield, Colorado, has also 
argued that one of its stem-cell therapies 
involves ‘minimal manipulation’. The FDA 
challenged that idea, and its injunction on 
the treatment was ultimately upheld in court 
(see Nature 488, 14; 2012). 

OvaScience investors clearly feared 
that AUGMENT would meet the same fate 
when, on 10 September, the company 
announced that the FDA had issued a letter 
questioning whether the project was exempt 
from agency review. OvaScience voluntarily 
suspended enrolment in the US arm of its 
AUGMENT clinical study, pending a meeting 
with regulators. The company’s shares fell 
23% that day, and have yet to recover (see 
‘Egg regs’). 

But analyst Jeffrey Cohen of Ladenburg 
Thalmann, a financial services company in 
Miami, Florida, says that the FDA letter has 
not changed his favourable assessment 
of OvaScience. The AUGMENT study is 
continuing in Europe, he notes, where 
the market for IVF is as much as three 
times larger than in the United States, and 
regulatory hurdles are not expected to be 
a barrier. Heidi Ledford 


specific traits for non-medical reasons. 
Mitalipov agrees that any trial would need to 
proceed with caution, but says that ifhe cannot 
perform the trials in the United States, he would 
consider going to the United Kingdom if it 
allows the procedure first. “Patients are suffer- 
ing the same issues, no matter where they are.’ m 


CORRECTION 

The print version of the World View by 
George Church (Nature 502, 143; 2013) 
was published before the author had 
approved changes. The online version was 
amended to better reflect his views. 
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wants to support science that makes a 

difference. But there is no simple for- 
mula for identifying truly important research. 
And the job is becoming more difficult. As 
funding gets squeezed, scientists face stiffer 
competition for resources and jobs, and it 
becomes more crucial than ever to develop 
reliable ways of spotting and supporting the 
best work. This week, Nature examines how 
the impact of research is measured — and asks 
whether today’s evaluation systems promote 
the most important science. 

A News Feature on page 288 examines how 
countries are assessing work through elabo- 
rate audit systems. Supporters say that these 
improve overall research quality, whereas crit- 
ics charge that they eat up time and money and 
skew grants towards ‘hot topics. A second News 
Feature on page 291 looks at the influence of the 
leading journals, traditionally recognized as a 
filter for important research. That role is now 
being challenged by changes in the publishing 
industry. And a Careers Feature on page 397 
discusses how grant applicants can articulate the 
potential impact of their research, as required by 
many granting agencies. 
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Evaluating research output and judging 
which work to fund is getting harder. 
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One way of assessing the influence of research 
is to track citations of papers, but there are 
concerns that such data are often proprietary 
and not easily evaluated, writes David Shotton 
on page 295. Shotton is the director of the Open 
Citations Corpus, a fledgling repository for 
open scholarly citation data. Researchers are 
increasingly producing output — data, videos 
and code, for example — that do not mesh well 
with older systems for evaluating scientific con- 
tributions. On page 298, Mark Hahnel, founder 
of figshare, an online tool that allows researchers 
to publish all their data in a citable, searchable 
and shareable manner, describes the complexi- 
ties of tracking the impact of these diverse 
research products. 

These stories and commentaries show how 
evaluation systems are having to evolve rapidly 
to keep up with changes in the way that science 
is practised and communicated (see Editorial, 
page 271). Separating the best from the rest has 
never been harder. m 
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JUDGEMENT DAY 


Many governments are assessing the quality of university research, 


much to the dismay of some researchers. 


BY BRIAN OWENS 


wo years ago, academics at Lan- 
caster University, UK, found 
themselves in the uncomfortable 
position of being graded. They 
each had to submit the four best pieces of 
research that they had published in the pre- 
vious few years, and then wait for months as 
small panels of colleagues — each containing 
at least one person from outside the university 
— judged the quality of the work. Those who 
failed their evaluations were offered various 
forms of help, including mentoring from a 
more experienced colleague, an early start on 
an upcoming sabbatical or a temporary break 
from teaching duties. 
The university did not undertake this huge 
exercise just to make sure that the researchers 
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were pulling their weight. The assessment was 
a drill to prepare for the Research Excellence 
Framework (REF), a massive evaluation of the 
quality of research at every university and pub- 
lic research institute in the United Kingdom, 
which is set to take place in 2014. 

The idea of the drill “was to identify areas 
where we could help people develop their pro- 
files’, says Trevor McMillan, Lancaster Univer- 
sity’s pro-vice-chancellor for research. Happily, 
he says, the results suggested that the univer- 
sity would score more highly than it did on the 
most recent national evaluation, in 2008. 

But other mock evaluations have proceeded 
less smoothly. In a survey of more than 7,000 
UK academics published on 3 October by 
the University and College Union (UCU) in 
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London, almost 12% reported having been 
told that failure to meet their university’s 
REF benchmarks in a drill could lead them 
to be transferred to a teaching-only contract 
before the real REF (see go.nature.com/eqiirr). 
Almost 10% said that they faced denial of pro- 
motion. At Cardiff University, around ten 
academics were pressured to switch to teach- 
ing-focused contracts after they scored poorly 
on a practice exercise, so as not to drag down 
their department, says Peter Guest, an archae- 
ologist at Cardiff and the university’s UCU 
liaison on the REF. This form of game-playing 
is discouraged, but not expressly forbidden, by 
the REF — however, making career decisions 
solely on the basis of the evaluation is against 
the university's own policies, as well as those of 
many other institutions, says Guest. 

All of the Cardiff cases were resolved ina 
day or two, with managers being “forcefully 
reminded” of the rules by the UCU, says Guest. 
But the experience shows how tempting it is 
for institutions to make career decisions on 
the basis of predicted REF scores, which are 
highly subjective. This is neither reliable nor 
fair, says Guest. (In response to questions about 
the incident, a spokesman for the university 
said in an e-mail: “We have been running a 
long-term programme for over four years to 
ensure our academic staff are on contracts that 
reflect what they actually do.”) 

Even many academics who did score well 
in the mock evaluations resent them. Around 
the United Kingdom, researchers view these 
national assessments as a bureaucratic imposi- 
tion that can stifle creativity. 


UNDER PRESSURE 

Most academics at Lancaster saw the mock 
REF as little more than a “mildly annoying” bit 
of bureaucracy, but the real thing is a different 
matter. “We have our department's top research 
professor working on preparing our REF sub- 
mission, and it’s taking up about a third of his 
time,” says one member of the mathematics 
and statistics department. “It seems like a waste 
of talent” Too many researchers are focused on 
winning grants and trying to predict what kind 
of work will be rewarded in the next assess- 
ment, rather than doing the best science they 
can, says Dorothy Bishop, an experimental 
psychologist at the University of Oxford, UK. 
“T think a lot of science is just not very well 
done these days because people are trying to 
do too many things.” 

But university administrators and the gov- 
ernment have come to rely on these evalua- 
tions to help them decide how to disburse 
funding. And the idea has been so popular 
with educational leaders that other countries 
are following the United Kingdom's example, 
with similar exercises cropping up in Australia, 
Italy, Germany and elsewhere. 

In the late 1980s, the United Kingdom 
became the first country to systematically 
evaluate the quality of its university research. 


The REF is the latest incarnation of these 
check-ups. Previously known as the Research 
Assessment Exercise (RAE), the evaluations 
are widely credited with helping to improve 
the country’s research system. Between 2006 
and 2010, citations of UK articles grew by 
7.2%, faster than the world average of 6.3%; 
and the country’s share of citations grew by 
0.9% per year, according to a 2011 analysis 
conducted by publishing company Elsevier 
for the government. 

The assessment is used by 
the UK government to dis- 
tribute more than £1.6 bil- 
lion (US$2.6 billion) a year in 
block grants to universities. 
More than 70% of the pot 
goes to the top-scoring 20 or 
so universities — last year, the 
University of Oxford got more 
than £130 million in quality- 
related funding — whereas the 
smallest, least research-inten- 
sive institutions make do with 
just a few tens of thousands of 
pounds. Assessment results are 
eagerly assembled into league 
tables, showing which univer- 
sities are performing best in 
which subjects (see “Top 5’). 

“The reputational aspects 
of it can be as important as 
the financial aspects,” says 
McMillan. Some smaller 
institutions that are strong in 
particular subjects — as Lan- 
caster is in physics — have 
reported that they have an 
easier time attracting students 
in those areas as a result of the 
assessments. And it is not just 
students. “One of the conse- 
quences is that people really 
want to come to a department 


Analysts used information 
from the 2008 UK 
Research Assessment 
Exercise to rank academic 
departments by quality. 


CHEMISTRY 


1. University of Cambridge 
2. University of Nottingham 
3. University of Oxford 
4. St Andrews/Edinburgh 
5. University of Bristo 
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research, many departments have tried to cut 
back on other demands, such as administra- 
tive work, says Guest. Furthermore, the results 
make it clear which departments and academ- 
ics are not pulling their weight, and allow uni- 
versities to make strategic decisions about how 
to invest resources. 

Royal Holloway, University of London, 
faced that very situation after the first research 
assessment in 1986, which ranked the uni- 
versity’s psychology department in last place 
nationwide, says Kathy Rastle, 
a cognitive psychologist and 
the department's director of 
research. Recognizing that it 
would not be able to boost its 
rating by hiring established 
stars, the department sought 
instead to attract and develop 
young talent. “We try to focus 
on people we feel have great 
potential,” says Rastle. 

Early-career psychologists 
at Royal Holloway are now 
offered “substantial, but tai- 
lored” start-up packages, she 
says, with hardly any teaching 
commitments for the first two 
years. They also get help from 
more experienced colleagues 
in preparing funding proposals. 

In the 2008 RAE, after two 
decades of nurturing junior 
staff, the department was 
ranked among the top ten in 
the country. It has ambitions 
to go even higher. “I look for- 
ward to the REF as an oppor- 
tunity to show what we've 
done, and to move up the 
ranks,’ says Rastle. 


AN IDEA SPREADS 
As other countries begin their 
own national research evalu- 


that did well in the RAE,’ says 
McMillan. “We've found it 
easier to recruit high-quality staff in physics.” 

For the REEF, universities submit a selection 
of work from most of their active researchers 
to one of dozens of subject-specific panels 
knownas Units of Assessment that correspond 
roughly, but not exactly, to university depart- 
ments. The panels evaluate the quality of the 
research using peer review and metrics such as 
citation indexes. And they will also, for the first 
time, look at the economic and social impact 
ofa university's research. 

Even critics of the assessments agree that 
they have had some positive effects on the 
country’s research system. Because the exer- 
cises judge academics on the quality of their 


2 IMPACT 


A Nature special issue 
nature.com/impact 


ho eoe 


PFO 


2 Geo. 


© 2013 Macmillan Publishers Limited. All rights reserved 


ations, they hope to achieve 

the same kinds of benefits. This year, Italy 
published the results of an evaluation begun 
in 2011 (see Nature http://doi.org/nrx; 2013); 
its goal is to increase meritocracy in the coun- 
try’s universities, where academics of the same 
rank and seniority currently receive the same 
salary, regardless of output. “There are no 
incentives to improve your research perfor- 
mance,’ says Giovanni Abramo, who studies 
bibliometrics and research evaluation at the 
National Research Council of Italy in Rome. 
“Now some of the money the government gives 
to universities will be based on this evaluation.” 
The Italian effort evaluates only three jour- 
nal articles from each researcher with teach- 
ing commitments, whereas Australia assesses 
all research output as part of its Excellence 
in Research for Australia (ERA) initiative, 
most recently in 2012. Only a relatively small 
pot of funding rides on the results: this year, 
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STAND AND BE COUNTED 


UNITED KINGDOM 


Name: Research Excellence 

Framework 

Next assessment: 2014 

, Asample of researchers 
sl submits four examples of work 

published since 2008; 
department heads often decide who is included. 
Each department provides a description of the 
economic and social impact of its work. 

Submissions are evaluated by expert panels that 
assign a quality profile to each university. Quality 
will account for 65% of the score, impact for 20% 
and research environment 15%. 

Results are made public and used to distribute 
the government's ‘quality-related’ research funding, 
which in 2013 was worth more than £1.6 billion 
(US$2.6 billion). 


eye) GERMANY 


Name: Research Rating 

Next assessment: Unknown 

Four pilots of the Research 

Rating have been conducted so 

far, in chemistry, sociology, 

electrical engineering and 
English and American studies. 

Panels of 15-20 people evaluated the quality of 
a selection of publications from each research 
institute. Panels also sought to promote young 
researchers and technology transfer. 

The government is considering a decision this 
month on whether to repeat the assessment and 
expand it to all disciplines. 

The assessments are made public, but will not 
be used to distribute funding. 


Assessment of academics has spread throughout the world, but each country does it differently. 


ITALY 


Name: National Agency for the 

Evaluation of the University 

System and Research (ANVUR) 

Next assessment: Unknown 

~w Researchers submitted three 

ed selections of their work — six if 
they had no teaching commitments — published 
between 2004 and 2010. 

The research outputs were evaluated by 14 
subject panels. Science panels made extensive use 
of bibliometrics. Large, medium and small 
universities were ranked separately, as were 
research agencies and inter-university consortia. 

The results were made public and were used to 
distribute around €540 million (US$729 million) as 
part of the 2013 university budget. 


CRS! AUSTRALIA 
Name: Excellence in Research 

for Australia 

Next assessment: 2015 

Universities track every piece of 

v research output from their 

academics; more than 400,000 

pieces were submitted in 2012. 

Output is reviewed by expert panels, using 
metrics such as citation counts and patents filed, 
as well as research funding and signs of prestige 
including researchers’ membership of learned 
academies. 

The results are released publicly to allow 
comparisons between institutions, but just 
Aus$68 million (US$64 million) is distributed 
according to the outcome. 


rankings determined the disbursement of just 
Aus$68 million (US$64 million). The outcome 
is mainly used to give institutions an idea of 
where they stand in terms of national and 
international quality, says Aidan Byrne, chief 
executive of the research council. 

The exercise has added benefits, he says. For 
example, it helps to verify that the council is 
distributing its Aus$800-million competitive- 
grants portfolio in a reasonable way. With a 
round of assessments costing Aus$4 million, 


the UCU’s survey is the stipulation by many 
universities that researchers must have 
produced four high-quality publications 
between 2008 and 2013, says Stefano Fella, 
a national industrial-relations official at the 
union. Of the academics polled, 67% felt that 
they could not produce the required output 
without working excessive hours — and 
34% said that the stress was affecting their 
health. Many have reported changing how 
they approach their work, says Fella — for 


“You should do good science, and not think 
in this appallingly strategic way.” 


says Byrne, “it’s a very efficient method of 
quality control”. Although there is no formal 
connection between the ERA and the grants 
process, the academics who peer-review grant 
applications are aware of ERA outcomes, and 
that feeds into their decisions, he says. 


GROWING PAINS 
It is too early to know how the newer assess- 
ment efforts in Italy, Australia and other 
countries will affect the research environ- 
ment there (see ‘Stand and be counted’). But 
researchers say that they have seen enough of 
the long-lived UK programme to know some 
of the downsides. 

One of the main worries that came up in 


example, some might have rushed to get a 
publication in the assessment period, even 
if the work might have benefited from more 
time. “They don’t think about the best way to 
present the work,” says Fella, “but what would 
be best for the REF” 

Frederic Lee, an economist at the University 
of Missouri-Kansas City, has studied how the 
UK research-assessment system has affected 
his discipline. He experienced two rounds of 
assessments first-hand while working at De 
Montfort University in Leicester in the 1990s. 
He says that economists who study alternative 
theories such as Marxism have been squeezed 
out because the assessment has consistently 
favoured mainstream work at elite institutions, 
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published in a small subset of journals. “There 
has been a lemming effect that has led to a 
homogeneity of research topics,” he says. 

Lee says that he was never pressured to 
abandon his research on the history of hetero- 
dox economic theories in the United King- 
dom, but was encouraged to submit his work 
to particular mainstream journals, where it 
stood a slim chance of getting accepted. Other 
academics have told him that they have been 
pressured to switch to more conventional 
research topics, and some had been squeezed 
out of departments at major institutions. 
Nature spoke to one economist as the Uni- 
versity of Manchester who studies alternative 
theories, and who left the department in part 
because the focus on RAE-friendly theories 
meant that prospects for advancement seemed 
essentially non-existent. 

Academics are particularly worried about 
the move to assess the impact of research in 
the REE They fear that this signals a prefer- 
ence for short-term, applied work over basic 
research that has no obvious, immediate public 
benefit. “As far as ’'m concerned, you should 
do good science, and not think in this appall- 
ingly strategic way,’ says Bishop. “Some good 
science takes a long time to do well” 

The time, effort and money being spent 
on submissions are also a major concern: 
preparations for the 2008 RAE cost universi- 
ties £47 million, according to a 2009 review of 
the exercise. Even smaller universities such as 
Lancaster asked several academics to spend 
months reviewing applications for mock 
REFs. The time burden can be even worse 
for administrators, who might have to hire 
extra staff to work on the REF, says Bishop. 
University College London, for example, has 
recruited four editorial consultants to work on 
the impact portion of the assessment. 

McMillan says that it is natural to spend a 
bit more time and money when preparing to 
tackle a new criterion. “It’s a dimension that 
we're not used to.” He adds that administrators 
at Lancaster are hiring external professional 
editors to help with only the final part of the 
process: polishing the case studies and impact 
statements that are written by academics and 
the university's research support office. Still, 
McMillan himself is currently spending two 
to three days a week tweaking Lancaster's sub- 
missions. “I think the REF is probably taking 
up more time than previous exercises,’ he says. 
“The shift to the impact agenda has seen a big 
increase in the workload” 

But some universities have seen the benefits 
of all that work. The vast improvements made 
by Royal Holloway’s psychology department 
demonstrate how much periodic evaluations 
can help, says Rastle. “Having the REF hang- 
ing over our heads makes sure we take all the 
steps we can to get the best out of our people.” = 


Brian Owens is a freelance writer based in 
New Brunswick, Canada. 
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effrey Rimer has noticed a change in the way other scientists treat 
him since his paper on kidney-stone growth inhibitors appeared on 
the cover of Science three years ago. When his colleagues introduce 
him, they often mention his publications or the publicity he has 
garnered, which he interprets as a nod to his Science paper’. “From the 
reaction of colleagues, it’s almost like you've joined a club,’ says Rimer, a 
chemical engineer and assistant professor at the University of Houston 
in Texas. “Fair or unfair, it’s like you've proved you can do good science.” 

Researchers often say that publishing in prestigious journals can 
make a career. And for decades, the most sought after of the bunch have 
been Nature and Science — broadly read journals that reject more than 
90% of the manuscripts they receive. A paper in one of these journals, it 
is said, can bring job opportunities, invitations to speak, grants, promo- 
tions and even cash bonuses and prizes. Rimer believes that his Science 
paper contributed to his winning a grant from the Welch Foundation, 
a chemical-research funding organization based in Houston, in 2012, 
and he expects that it may help when he seeks tenure at his university. 

His impressions echo what many other scientists say — often with 
gritted teeth — about premier journals. But the publishing world is rap- 
idly changing, and the leading titles are facing increasing competition. 
The push for open-access publishing has gathered steady steam; more 
than 5,000 open-access journals have been launched since Rimer’s 
paper was published in October 2010. These journals, along with the 
more established open-access publications, are attracting a growing 
share of submissions, threatening the hold of the leading journals. 

Beyond that trend, some advocates for the open-access movement 
have specifically attacked Science and Nature, which they label as 
‘glamour journals: They say that the journals’ prestige is part ofa busi- 
ness model in which hot findings are flaunted as a way to justify their 
subscription rates. And many senior scientists worry that too much 
attention is paid to where people publish rather than to what they have 
done — that Science, Nature and similar publications hold too much 
sway over the careers of working scientists. “It’s like a kind of addiction,” 
says Stephen Curry, a structural biologist at Imperial College London 
who has been vocal about the issue on his blog, Reciprocal Space. 

To get a sense of whether the changes in the publishing landscape 
have altered the — 
allure and impact Wy IMPACT 
Grtop net journal { f A Nature special issue 


Nature interviewed awe : 
: ~ ‘| Nature.com/impact 
Rimer and several % @ee- 2 imp 


Publishing in the most 
prestigious journals can 
open doors, but their 
cachet is under attack. 


BY EUGENIE SAMUEL REICH 
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VIEWS FROM THE LAB BENCH 


ANKE BILL 
Says that her Cell 
paper helped her 

job search. 


other early-career researchers who published 
for the first time in Nature, Science or other 
journals in October 2010 (see ‘Views from the 
lab bench). 

Several of those researchers say that three 
years on, they feel that getting a paper in a pre- 
mier journal helped their careers in concrete 
ways. Although they cannot know how their 
careers would have unfolded without these 
high-profile publications, what they believe is 
still telling. It is why some of them are reluc- 
tant to join established scientists who say that 
they will not submit to Nature and Science asa 
matter of principle, a step many younger 
researchers are unwilling to take. 

Yet critics are working hard to change how 
researchers — and those who assess their 
work — judge the value of different publica- 
tions. Sandra Schmid, head of the cell biology 
department at the University of Texas South- 
western Medical School in Dallas, is one of 
many academics advocating ways to identify 
promising candidates other than simply look- 
ing for leading journals on their CVs. “The 
drive to publish in these journals does more 
harm than good,’ she says. 


PUBLICATION PUBLICITY 
Ping Chi, a medical oncologist who landed a 
paper’ in Nature three years ago, says that she 
got an important boost towards launching 
a clinical trial of new cancer drugs, which is 
now starting up. Her paper investigated how 
two proteins stabilize the survival of gastro- 
intestinal tumours. Had it been published in a 
lesser-known journal, she says, she might still 
have been hired by Memorial Sloan-Kettering 
Cancer Center in New York, but she probably 
would not have received such a generous start- 
up package and would have spent some of the 
past two years raising funds. Instead, she put her 
energy into persuading her collaborators and 
pharmaceutical companies to support the clini- 
cal trial ofa therapy that inhibits the proteins. 
The Nature paper, Chi says, helped to 
establish her work as a significant advance, 
especially because it received media attention 


YINGJIE PENG 


Says that astronomers 
do not generally care 
where papers appear. 


ANNELE VIRTANEN 


Says that her Nature 
paper opened doors 
outside her field. 


(thanks in part to a press release issued by 
Nature’s press office). 

In some developing countries, publishing in 
top-tier journals has extra appeal; researchers 
in China and India sometimes receive bonuses 
or salary increases when they get papers into 
Science or Nature. Yingjie Peng, a Chinese-born 
astrophysicist and postdoctoral researcher at the 
Cavendish Laboratory of the University of Cam- 
bridge, UK, says that ifhe were to seek a faculty 
position in China, it would be invaluable to have 
a Nature or Science paper. “Government officials 
may not understand the work — the easy thing 
to do is compare journals,’ he says. 

Peng argues that publishing in elite journals 
is less important in the United States and the 
United Kingdom. Most astronomers see papers 
as soon as they are posted to the arXiv.org 
preprint server. And where a paper is published 
is not as important as who did the work and 
how technically adept it is. Peng is doing well in 
terms of recognition; his paper” on galaxy evo- 
lution, published in The Astrophysical Journal 
three years ago, has already received a substan- 
tial count of more than 150 citations. 

The Astrophysical Journal allows longer 
papers than Science and Nature typically would, 
which gave Peng a chance to fully explain his 
method for extracting laws of galaxy evolution 
from data rather than deriving them entirely 
theoretically. He credits the paper with helping 
him to get his position at the Cavendish. 

Anke Bill, a cell biologist at the Novartis 
Institutes for Biomedical Research in 
Cambridge, Massachusetts, had a similar 
experience with her 2010 paper’ in Cell, a 
specialized journal that is highly prestigious 
in the biological sciences. Her paper focused 
on cytohesins, proteins thought to be involved 
in human lung cancer. Bill says that she and 
her adviser had initially aimed for the wider 
exposure that would come from publishing in 
Nature. But they say that they received a tough 
set of reviews that required more experiments. 
When Bill resubmitted the paper with the extra 
data, Nature’s editors decided that the paper 
was too long and technical, she says, but Cell 
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JEFFREY RIMER 


Says that his Science 
paper helped him to 


PING CHI 


Says that her Nature 
paper helped to start 


win a grant. a clinical trial. 


accepted the paper in its expanded form. 

Bill says that beyond the world of biomedical 
science, a Nature or Science paper would have 
boosted her reputation more. But within her 
field, she says, the Cell paper had a big impact. 
It may have helped her to land her current 
position, especially because the laboratory at 
the German university where she did her PhD 
was not well known outside that country. The 
Cell paper showed that she could develop and 
test a promising novel hypothesis. “I got posi- 
tive feedback everywhere I applied,’ she says. 

Other researchers point to the advantages 
of less selective journals, such as PLoS ONE, 
which publishes a high volume of papers 
online. Nicholas Longrich, was a postdoc 
in palaeontology at Yale University in New 
Haven, Connecticut, when he published his 
2010 paper’ in PLoS ONE showing evidence 
of cannibalism in Tyrannosaurus rex. “The fact 
that you probably wont get it rejected and have 
to submit elsewhere means you can get your 
work out quickly,” he says. 

Longrich also liked that PLoS ONE is open 
access, which made it easier for his T: rex work 
to be read by others. Still, he says that he did 
not land his current job as a lecturer at the Uni- 
versity of Bath, UK, until he published three 
more papers, in subscription-based journals 
(Nature, The Proceedings of the National Acad- 
emy of Sciences and Current Biology). “Did 
Nature help my career more than PLoS? I can't 
prove it, but I think so,” he says. 


MEASURING IMPACT 
Critics of the status quo object to evaluating 
research on the basis of where it is published. 
The shorthand way to do this is by the journal 
impact factor — an index kept by Thomson 
Reuters, an information-services company 
based in New York. A journal’s 2013 impact 
factor, for example, would be computed by sum- 
ming the number of citations garnered this year 
to papers published in that journal in 2011-12 
and then dividing that sum by the number of 
papers the journal published during that span. 
Curry, who received hundreds of comments 
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on his blog when he criticized impact factors 
in 2012, says that Nature and Science may 
command high reputations in part because they 
have high impact factors (38.6 and 31, respec- 
tively, in 2012), but those figures are averages 
that are pulled upwards bya few very frequently 
cited papers. It is not rational, he says, for papers 
that are not cited as often to get a boost just 
because they come out in the same journal. 

Some experts are taking active steps to 
challenge the sway of the leading journals. In 
December 2012, hundreds of scientific leaders, 
funding bodies, journals (including Science, but 
not Nature) and other organizations gathered in 
San Francisco, California, to sign the Declara- 
tion on Research Assessment (DORA), which 
criticizes reliance on the impact factor and 
commits signatories to evaluate research on the 
basis of its scientific merit. Schmid, the Texas 
cell-biology chair, signed DORA and published 
a commentary in Science Careers® saying 
that her department will no longer filter app- 
licants for faculty jobs on the basis of their 
publications. 

Her department fills one or two faculty 
positions a year and receives as many as 
300 applications for each one. In the past, the 
department weeded out candidates who had 
not published in top-tier journals, but Schmid 
dislikes that approach. “How many brilliant 
scientists are just outside the spotlight?” she 
says. She is now filtering candidates on the 
basis of a covering letter describing their past 
work and how they envision their future. 


RANKINGS RACE 
Itis hard to assess how widespread such changes 
are, because research evaluations and hiring 
processes are often confidential. But Henk Moed, 
a bibliometrician and scientific adviser at 
Elsevier, a science publisher in Amsterdam, 
suspects that the journal impact factor still 
looms large in many hiring decisions. Evaluators 
may decide privately to average the impact fac- 
tors of the journals listed on a CV as a way to rank 
candidates. He notes that some institutional 
rankings, such as the Academic Ranking of 
World Universities, compiled by Shanghai 
Jiao Tong University, give explicit weight to the 
number of Nature and Science papers an institu- 
tion has produced — making it likely that some 
universities would then begin to rank prospec- 
tive faculty by the same measure. “There is more 
and more evaluation, and a need for researchers 
to prove their quality,’ Moed says. “Journal rep- 
utations play a role, and that role has increased” 
Others echo Moed’s sense that Nature and 
Science papers are often relied upon implic- 
itly. Amy Ruschak, a biochemist and assistant 
professor at Case Western Reserve University 
in Cleveland, Ohio, says that her 2010 Nature 
paper’ on a cellular apparatus that destroys 
toxic proteins was a highlight of her applica- 
tion for faculty positions and undoubtedly 
contributed to her success. “It’s central, but no 
one will specifically say that,’ she says. 


GROWING COMPETITION 
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2012 


Submissions to Nature have risen over the past 16 years, and the journal 


has become more selective. But the growth in submissions is slower 
than the worldwide increase in the number of published papers. 
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Moed notes that bibliometricians are trying 
to improve measures of journal quality while 
also educating researchers about the value 
and limitations of such metrics. And Stefano 
Bertuzzi, executive director of the American 
Society for Cell Biology in Bethesda, Maryland, 
which spearheaded DORA, says that although 
the current scientific culture unduly rewards 
Nature and Science publications, he thinks that 
the rapid growth of open access to articles online 
will change that. “Open-access articles get read a 
lot, so they should gain visibility,’ he says. 

Visibility is what motivated Olga Momcilovic, 
a cell biology postdoc at the Buck Institute for 
Research on Aging in Novato, California, to 
send her paper® on DNA damage in stem cells 
to PLoS ONE in 2010. “Social media and Google 
searches list papers by relevance, not by impact 
factor,’ she says. 

There are some signs that the leading jour- 
nals are not keeping pace with the overall 
growth in publishing. According to informa- 
tion made available by Nature and Science, 
submissions to both journals have climbed 
over the past ten years, reaching more than 
10,000 per year for Nature and more than 
12,000 for Science. However, the number of 
articles published worldwide in all journals 
has been rising much more rapidly, suggesting 
that many researchers are looking to publish 
elsewhere (see ‘Growing competitior). 

A similar story emerges from data on the 
most highly cited papers. In 2012, Vincent 
Lariviére, an information scientist at the Uni- 
versity of Montreal in Canada, studied the 
clout of Nature, Science and other top journals 
by examining citation statistics’. He found 
that although these journals are publishing a 
growing number of highly cited papers each 
year, they are not keeping up with the industry 
as a whole; overall, their proportion of the total 
number of highly cited papers is declining. 
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Nature and Science have press offices that 
are more active than those of many other 
journals, however — making it more likely that 
papers published there will receive notice. And 
because electronic publishing has led to a flood 
of online information, journals that can claim 
to be highly selective fill a niche by elevating 
papers worthy of reading, says Lariviere. 

Annele Virtanen, an aerosol chemist who is 
now an assistant professor at the University of 
Eastern Finland in Kuopio, agrees. She was a 
postdoc in 2010, when she published a Nature 
paper’® showing that organic aerosol particles 
that most researchers had assumed were liquid 
were probably solid. The publication opened 
all kinds of doors for Virtanen. The journal's 
visibility meant that climate modellers and 
atmospheric chemists outside her original 
research field saw her paper, and many wrote 
to her, helping to drive her current research ina 
more generally relevant direction. 

She now has more results and is thinking 
of submitting to Nature again — or to Science. 
Shooting for these publications, she believes, 
means reaching to do excellent research that 
will stand out. “It improves the level of science,” 
she says. “I can’t see so many bad sides.” m 


Eugenie Samuel Reich reports for Nature 
from Boston. 
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Open citations 


Make bibliographic citation data freely available and substantial benefits will flow, 
says David Shotton, director of the Open Citations Corpus. 


hen Heather Piwowar set out 
in May last year to investigate 
whether making research data 


publicly available increased the citation rates 
of articles’, she never anticipated the difficul- 
ties. Piwowar, co-founder of ImpactStory’, 
and who is based in Vancouver, Canada, was 
at the time a postdoc at Duke University in 
North Carolina. Lacking institutional access 
to Scopus, Elsevier’s database of scholarly 


citations, she eventually obtained access 
through a research-worker agreement with 
Canada’s National Science Library. But this 
required her to be fingerprinted to obtain a 
police clearance certificate because she had 
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lived in the United States. “I wasted days 
trying to access the citation data required for 
my study,” she told me. “It was just ridiculous” 
Piwowar needed to analyse citation counts 
for 10,000 articles, but the other major cita- 
tion source, the Thomson Reuters Web of 
Science, did not at the time support queries 
using PubMed’s unique identifier numbers. 
She explains: “Had there been open citation 
data, I could have written my own script!” > 
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> Steven Greenberg, a neurologist at 
Harvard Medical School in Boston, Massa- 
chusetts, had a similar experience when he 
set about revealing how hypotheses can be 
converted into ‘facts’ simply by repeated cita- 
tion*. Greenberg had manually to construct 
and analyse a citation network that contained 
242 papers, 675 citations and 220,553 distinct 
citation paths that were relevant to a particu- 
lar hypothesis. Had those citation data been 
readily accessible online, he would have been 
saved considerable effort. Research practice 
suffers because access to citation data is 
currently so difficult. 

In this open-access age, it is a scandal that 
reference lists from journal articles — core 
elements of scholarly communication that 
permit the attribution of credit and inte- 
grate our independent research endeavours 
— are not readily and freely available for 
use by all scholars. 

To rectify this, citation data now need to be 
recognized as a part of the commons — those 
works that are freely and legally available for 
sharing — and placed in an open repository. 
To that end, since 2010 I have led a project 
funded by two small grants totalling £132,000 
(US$212,000) from Jisc (www.jisc.ac.uk), a 
UK information technology research and 
development funding organization, to estab- 
lish and develop the Open Citations Corpus 
(OCC). The OCC is a fledgling repository 
for open scholarly citation data that is now 
seeking sustainable funding to become a 
cornerstone of the digital research infrastruc- 
ture that supports the academic enterprise. 


CLOSED SHOP 

Although alternative metrics for impact and 
esteem are being developed’, direct citation 
remains a keystone indicator of the sig- 
nificance of an output (see page 298). 
Scholarly communication involves 

the flow of information and ideas 
through the citation network, 

and analysis of changes in the 

network over time can reveal 

patterns of communication 

between scholars and the 
development and demise of 

academic disciplines. Such 

information is central to 

scholarly endeavour. It is 

also fundamental to good 
decision-making about 

research investment and 

strategy, to facilitate innova- 

tion, and to promote growth 

and prosperity, particularly 

in light of the increasingly 
international nature of research 
collaborations’. 

The most authoritative sources of 
scholarly citation data are the Thomson 
Reuters Web of Science, which grew from 
the Science Citation Index created by US 


FREEDOM OF INFORMATION 


Bibliographic citation data are freely 
available from an estimated 4% of 
the world’s scholarly literature. 
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scientist Eugene Garfield in 1964, and which 
was originally published by the Institute for 
Scientific Information (ISI); and its main 
commercial rival, Elsevier’s Scopus, released 
in 2004. Both have wide coverage of the lead- 
ing literature, but because neither is complete, 
they are widely regarded as complementary’. 

For access to these two resources, UK 
research universities each pay tens of thou- 
sands of pounds a year®, with equivalent 
sums charged at institutions in other devel- 
oped countries. The exact values of these 
subscriptions are closely guarded industrial 
secrets, and the university librarians who pay 
these fees are bound by confidentiality agree- 
ments from disclosing them. This high cost 
severely disadvantages all those who work 
outside such wealthy institutions, including 
most businesses and the general public. The 
other significant sources of citation informa- 
tion, also run by commercial companies but 
accessible without subscriptions, are Google 
Scholar and Microsoft Academic Search, 
released in 2004 and 2009, respectively. 
Google Scholar’s coverage is wider than 
that of the others, because it includes books, 
theses, preprints, technical reports and other 
non-peer-reviewed ‘grey’ literature. 

All these sources have licence restrictions 
that prevent the re-publication of their cita- 
tion data. For this reason, bibliometrics 
papers are rarely permitted to publish the 
data on which their conclusions are based 
— hampering reuse, validation of findings 
and other advantages of open data. 

Worse, the available citation data are not 
accurate. My own citation record differs 
considerably across the Web of Science, 
Scopus, Google Scholar and Microsoft 
Academic Search. For example, a 2009 paper’ 

on semantic publishing that I co-authored 
currently has citation counts of 22, 
37, 88 and 16, respectively, in these 
four databases. Which to trust? 
More worryingly, an earlier 
protein-crystallography paper® 
has three separate entries 
in the Web of Science, with 
citation counts of 59, 19 and 
0, respectively. In my view, 
this calls into question the 
reliability of the Thomson 
Reuters Impact Factor, 
which is based on such 

counts. 


ASOLUTION 
The OCC, as an open 
repository of scholarly cita- 
tion data made available under 
a Creative Commons public 
domain dedication, is attempt- 
ing to improve matters. It aims to 
provide accurate citation data that 
others may freely build upon, enhance 
and reuse for any purpose, without 


restriction under copyright or database law. 
We began building the OCC in mid- 
2010, and released the first version in 
mid-2011. This prototype provided open 
access to reference lists from the 204,637 arti- 
cles that then comprised the Open Access 
Subset of PubMed Central (OA-PMC), 
containing 6,325,178 individual references 
to 3,373,961 unique papers. Despite its small 
size, this corpus contains references to about 
20% of all the biomedical literature indexed 
in PubMed that had been published between 
1950 and 2010, including all highly cited 
papers in every biomedical field. Available at 
http://opencitations.net, the OCC is struc- 
tured to enable the information to be easily 
integrated with similar information from 
elsewhere — the data are encoded as Linked 
Open Data using the SPAR (Semantic 
Publishing and Referencing) Ontologies” 
and the latest Semantic Web standards. 
Other open citations resources exist. The 
two main ones are CiteSeerX (citeseerx.ist. 
psu.edu), which contains around 13,500,000 
references from 1,242,041 articles, primarily 
in computer science; and CitEc (Citations 
in Economics; citec.repec.org), which con- 
tains 13,544,970 references from 545,641 
documents. Together, these resources and 
the OCC have the references from some 
1,980,000 articles — a mere 4% of the esti- 
mated 50 million articles that have been 
published (see ‘Freedom of information). 
We are currently revising the OCC data 
model, improving its hosting infrastructure 
and expanding its coverage, both by updat- 
ing the OA~PMC holdings, which have 
more than doubled since the initial ingest to 
672,442 articles, and by ingesting citation data 
from the 881,216 preprints in the arXiv server, 
thus adding citations in mathematics and the 
‘hard’ sciences to augment the initial bio- 
medical coverage. Future work will include 
integration with CiteSeerX, harvesting data- 
set-to-article references from the Dryad 
Digital Repository, and extracting references 
from the pre-digital ‘legacy literature that is 
poorly represented in other citation reposi- 
tories. This applies particularly to fields in 
which such literature is both well organized 
and of enduring value — notably astronomy, 
and biodiversity and biological taxonomy. 
Ideally, references will come directly 
from publishers at the time of article pub- 
lication. Most publishers are sympathetic 
to the idea of putting article reference lists 
outside the journal-subscription paywall, as 
they do copyrighted abstracts. We already 
have agreements with several major journal 
publishers for the future routine harvesting 
of reference data. As well as the ‘pure’ open- 
access publishers, the references from which 
are open by definition, the publishers of 
subscription-access journals include Nature 
Publishing Group (NPG), Oxford Univer- 
sity Press, the American Association for the 


Advancement of Science (which publishes 
Science), Royal Society Publishing, Portland 
Press, MIT Press and Taylor & Francis, all of 
which will make references available either 
from some or from all of their journals. This 
represents a small but growing proportion 
ofall the journal articles published in a year. 
References will be harvested centrally 
from CrossRef, the organization that pro- 
vides digital object identifiers (DOI) for 
journal articles, to 


“Ideally, which these publish- 
references ers already submit 
will come article reference lists 
directly from as participants in 
publishers its CitedBy Linking 
at the time service. However, 
of article publishers need to 


indicate their consent 
in the article meta- 
data for the article’s references to be 
made open (see go.nature.com/x4pzta), 
because by default references are kept 
private. No other action is required; it is 
straightforward and free. 

The long-term aim of the OCC is to host 
citation information for most of the world’s 
scholarly literature, in the arts and humani- 
ties as well as the sciences. This will require 
a major curatorial effort and underpin- 
ning technical innovation, on the scale of 
PubMed, which is run by the US National 
Library of Medicine. 


publication.” 


OPEN SEASON 

In an ideal world, publishers would host 
their own bibliographic and citation data, 
following the example of NPG (publishers 
of this journal) — the first and currently only 
company to make such information available 
as Linked Open Data, at data.nature.com. 

But there are separate benefits to be gained 
from the aggregation of such data into a single 
corpus. The OCC will provide integrated 
access to citation data from a variety of 
sources, inside and outside traditional schol- 
arly publishing, with clear provenance data. 
It will expose entity relationships, including 
article-to-article, article-to-database and 
database-to-article citations, and will reveal 
shared authorship and institutional mem- 
bership, common funding, and semantic 
relationships between articles, where the data 
are available. 

Once citation data are openly available, 
useful analytical services can be built, includ- 
ing faceted search-and-browse tools, recom- 
mendation and trend identification services, 
and timeline visualization. Some of these we 
have already developed in prototype. The 
OCC’s usefulness for calculating citation 
metrics will, of course, increase in propor- 
tion to its expanding coverage. 

There is one other service that we think 
could be of particular benefit to authors 
and editors — an erroneous reference 
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correction service. About 1% of references 
in published papers contain errors of vary- 
ing severity, ranging from the trivial — for 
example, substitution of ‘beta amylase’ for 
‘B-amylase’ in the reference title, or the 
omission of accents in author names — to 
the more serious, such as errors in the year, 
volume, page numbers or DOI. The OCC 
already uses citation-correction methods 
internally for reference targets that are mul- 
tiply cited, or for which authoritative biblio- 
graphic records can be obtained externally. 
A similar Web service that could detect 
errors in uploaded reference lists might sig- 
nificantly reduce the number of mistakes in 
published papers. 


HELP US 

So what next? Just over a decade ago, a 
similar aim for open citation data was held 
by the Open Citation Project (opcit.eprints. 
org), a collaboration between Southampton 
University, UK; Cornell University in Ithaca, 
New York; and arXiv, that ran between 1999 
and 2002. That project developed Citebase, 
a database of citation information, which its 
developers described as “the crown jewel 
of the Open Citation Project”. Following 
the link to citebase.eprints.org today, one 
gets the message “No website currently 
exists at this URL” 

Making the transition from a promising 
academic project to a robust sustainable 
global service is extremely difficult. For the 
OCC to avoid the fate of Citebase, and instead 
grow into a comprehensive and trustworthy 
source of well-curated open citation data serv- 
ing the entire scholarly community across all 
disciplines, it requires champions, managers, 
developers and curators. It also needs genuine 
collaborations with similar endeavours, a 
sustained and sizeable income stream from 
funders, supporters and investors commit- 
ted to achieving a social good rather than a 
financial return, direct support from the pub- 
lishing community, and adoption by a major 
institution or international organization. 
Can you help? = 


David Shotton is director of the Open 
Citations Corpus and a senior research 
fellow in the Oxford e-Research Centre, 
University of Oxford, UK. 

e-mail: david.shotton@oerc.ox.ac.uk 
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The reuse factor 


The reference is not dead — it is exploding to encompass the full spectrum of research 
outputs from lines of code to video frames, explains Mark Hahnel. 


esearchers are still strug- > 
R gling to find, manipulate 
and cite research outputs 

other than published papers. 
Data-management plans for 
research — detailing what data 
will be created and how, and 
outlining plans for data sharing 
and preservation — are a core 
requisite of all grant applica- 
tions for a long list of US and UK 
funding agencies. These include 
the US National Science Founda- 
tion, the US National Institutes 
of Health, NASA, the UK Bio- 
technology and Biological Sci- 
ences Research Council, the UK 
Medical Research Council and 
the Wellcome Trust. Funders 
require grantees to make all their 
products available in a citable, 
sharable and discoverable man- 
ner. As a result, tools such as the California 
Digital Library's Data Management Plan 
and the UK-based Digital Curation Centre's 
DMPonline have materialized to help opti- 
mistic grant applicants to fill in this section. 

Even with the help of such tools, data- 
management plans are being rejected in 
some grant applications. This is a wake-up 
call. Researchers should be long past think- 
ing that depositing their data in a file-hosting 
service such as Dropbox is sufficient. Yet the 
majority of academics still consider journal 
articles to be the only valid, formal record of 
their research — the main currency for credit. 

In my view, the current model of grouping 
non-image files together in supplemental data 
appended to a paper is laughable. Data are 
the bedrock of a paper and come in myriad 
forms; for instance, videos for research on cell 
motility or biomechanics, or code for climate 
or epidemiology models. Scholars are increas- 
ingly sharing these sorts of raw data through 
repositories such as the Dryad Digital Reposi- 
tory and GenBank, which allow for the cita- 
tion of data sets, videos, genetic sequences 
and other such files that publishers often 
struggle to accommodate. 

In 2011, I set up another such company, 
figshare, to improve the way that the ‘data 
behind the graph’ is disseminated, exploit- 
ing visualization tools such as D3.js and 
Jmol’. At figshare, we work with research 
groups and publishers to make data reusable, 
reproducible and interactive. (Macmillan 
Science and Education, the publisher of 


this journal, is an investor in figshare.) 

But the generation of huge numbers 
of citable research outputs is confusing 
researchers who are accustomed to citing 
only papers. The FORCE11 Amsterdam 
manifesto on data-citation principles, drawn 
up in 2011 by a community of scholars, 
librarians, archivists, publishers and research 
funders, states: “A data citation in a publica- 
tion should resemble a bibliographic citation 
and be located in the publication's reference 
list.” A quick look at the most recent journal 
citations of data held in figshare shows that 
this recommendation is not enforced by 
publishers or authors: only one in five cited 
the data in the reference list; the rest men- 
tioned it in methods or deposition sections. 

There are standard protocols for citing 
static data such as genomes or clinical-trial 
results’. But data is increasingly dynamic, 
coming from sensors that monitor, for exam- 
ple, geophysical and atmospheric changes in 
real time. Tracking these feeds in a way that 
is automated and machine-processable will 
lead to improved validation and verification, 
and help to prevent falsification of data that, 
in areas such as climate-change research, has 
led to much unnecessary wrangling’. 

Scientists should appreciate that making 
their research outputs citable enables more of 


a) 

FE 2X) IMPACT 

{ 3 3! A Nature special issue 
Ooo al 


By Sooe caf] Nature.com/impact 


298 | NATURE | VOL 502 | 24 OCTOBER 2013 


© 2013 Macmillan Publishers Limited. All rights reserved 


their research to have quantifiable 
impact. To this end, the Research 
Data Alliance (RDA) was estab- 
lished in August 2012 by a steering 
group of funding agencies from 
the United States, Europe and Aus- 
tralia. The RDA aims to accelerate 
and facilitate research data shar- 
ing and exchange across multiple 
disciplines that have complicated 
funder mandates anda need to cite 
various unconventional research 
outputs. An RDA working group 
plans to provide prototypes and 
examples. For instance, individ- 
ual cells of a spreadsheet or a few 
frames of a video can be cited ina 
way that does not dilute a paper's 
total number of citations. 

Early adopters of open-data 
science are already seeing the 
benefits. Computer scientist 
Titus Brown at Michigan State Univer- 
sity in East Lansing, for example, blogged, 
“My career has already been immeasurably 
improved by my openness, including posting 
our software.” And the efficiency gains are 
long overdue. 

I believe that publishers should mandate 
that all the research that goes into forming 
the conclusions of a paper be made openly 
available, when ethically possible. Even better 
would be for raw data to be made available in 
the paper, as F1000 Research, an open-access 
journal, and PLoS journals already do for 
their articles. Publishers should also ensure 
that all citations — to products of all kinds 
— are included in reference lists, and should 
make this bibliometric data openly available 
ina searchable format (see page 295). 

Are we witnessing the death of the 
reference? No, we are seeing the birth of an 
exponentially larger number of citations, 
crediting a much wider variety of outputs. 
The end is nigh for the measuring of impact 
using only citations to published papers: this 
is the age of the ‘reuse factor: = 


Mark Hahnel is founder of figshare. 
e-mail: mark@figshare.com 
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CENTRO DE OPERACOES 
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The operations centre that IBM designed for Rio de Janeiro in Brazil helps to coordinate the city’s activities. 


INFORMATION TECHNOLOGY 


Slouching towards utopia 


As interests vie for the soul of smart cities, Melanie Moses asks: “Smarter for whom?” 


ity — now live in cities. Cities magnify 

human endeavours: they account for 
much more than half of humanity’s pollu- 
tion, energy consumption, crime and disease 
spread, while also incubating the lion’s share 
of innovations, technology, art and entertain- 
ment. A sustainable, equitable future on our 
crowded planet will require fundamental 
changes in how cities operate. In Smart Cities, 
Anthony Townsend examines how informa- 
tion technology is shaping the development 
of ‘smart’ cities. 

What makes them smart? Accessible and 
efficient services, transportation and infra- 
structure are essential to the mix. Bike-share 
programmes, for example, exemplify smart 
urban problem-solving by reducing traf- 
fic and pollution, encouraging exercise and 
providing cheap transport. Information 
technology makes it possible to adapt bike 
placement to the variable flow of riders, who 
use smart phones to find them. And smart 
cities need not reinvent the wheel: bike-share 
programmes have spread rapidly since the 
first large-scale launch in Paris in 2007, with 
now hundreds of thousands of bikes in cities 
from Beijing to Stockholm. Each scheme has 
evolved to meet local needs, leading to the 
emergence of solar-powered bike stations, sta- 
tionless bike exchanges and new car-sharing 


S° 3.5 billion people — half of human- 


Smart Cities: 
Big Data, Civic 
Hackers, and the 
Quest fora New 
Utopia 


schemes. Pricing, —= 
funding and manage- “atria ne | 
ment also evolve in } 
partnerships between SMART 
non-profit organiza- 
tions, governments CITIES 
and, increasingly, oo 
corporations. (It’s no Tome, 
secret who funds New 
York's Citi Bike.) 

But corporate 
involvement is a 
mixed blessing. Uproar 
greeted the recent rev- 
elation that the US TOWNSENI 
National Security WW. Norton: 2013. 
Agency has been mon- 
itoring communications across the globe, but 
Townsend cautions that city halls and their 
corporate partners may be intruding on your 
privacy at a more intimate scale. For example, 
in Rio de Janeiro, IBM’s Intelligent Opera- 
tions Center, which was originally designed 
for disaster management, has become a “mis- 
sion control center for mayors” with “people 
looking into every corner of the city, 24 hours 
a day, 7 days a week” in order to reduce urban 
crime and ensure that buses run on time. 
South Korea has teamed up with Cisco Sys- 
tems and invested US$35 billion in Songdo, a 
model city for energy conservation through 
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ubiquitous computing. Millions of sensors are 
embedded in roads and power grids to track 
and predict people’s movements. Townsend 
argues that a relentless focus on efficiency 
has been “engineering serendipity out of the 
urban equation’, damping the creative spark 
that makes cities dynamic and adaptable. 

Townsend contrasts top-down, corporate 
urban management with bottom-up action 
by civic hackers and engaged citizens who 
provide creative, but not always scalable, 
technologies to empower people. Among 
his compelling examples is Access Together, 
a crowdsourced online mapping tool that 
provides information on accessibility for 
disabled New Yorkers. Another example is 
from Nairobi, where in 2009 activists literally 
put the Kibera slum on the map by walking 
the streets with Global Positioning System 
receivers. The lives of a quarter of a mil- 
lion people suddenly became visible; before 
Google Maps had shown just a forest. Such 
crowdsourced cartography is a first step in 
demanding water, sanitation and other gov- 
ernment services. This is real inclusivity, 
Townsend notes, whereas many smart solu- 
tions deemed successes are developed by, 
and largely for, the privileged. Bike shares are 
great for young commuters, but they don't do 
much for the elderly, disabled or struggling 
families with young children. 
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>» With such caveats, Townsend conveys 
a cautious optimism that information 
technology might make cities smarter. But 
the book only nibbles at the edges of fund- 
amental shifts in how data-driven cities 
might operate in future. Top-down versus 
bottom-up approaches to urban develop- 
ment are discussed anecdotally, neglecting 
a deeper analysis of cities that exemplify an 
organic “organized complexity’, as social 
critic Jane Jacobs described decades ago. 

Perhaps the history of failed top-down 
urban planning models has left Townsend 
sceptical of systematic and quantitative 
scientific analyses of cities. However, we 
need a scientific understanding of what 
makes cities adaptive, resilient and pros- 
perous, as we create ever more, and ever 
larger, urban environments. 

In The New Science of Cities (MIT Press, 
2013), urban planner Michael Batty pro- 
poses a new approach based on models 
grounded in volumes of data that reveal 
fine details of how individuals behave in 
urban environments. Equally important 
is a growing understanding of cities as 
dynamic systems driven by top-down and 
bottom-up processes. The macroscopic 
analysis of cities led by Luis Bettencourt 
and Geoffrey West at the Santa Fe Insti- 
tute in New Mexico reveals regularities in 
how cities are distributed and grow, and 
common patterns 


in how transpor- “Cities will 
tation, the pace of need to be 
innovation and smarter than 
economic activity the sumof 


vary across cities 
of different sizes. 
This work, and my own, suggests that 
urban energy and information flow are 
governed by basic physical principles, even 
given cities’ different histories, politics and 
cultures. To understand cities, we need not 
just the abundant data that sensored cities 
will produce, but also a new framework 
to understand how individual stories are 
woven into vibrant urban systems. 

On a rapidly urbanizing planet, smart 
cities will dominate the human cultural 
landscape and affect how we live, consume 
resources and manage the environment. 
Cities will need to be smarter than the sum 
of their parts and founded on more than 
routers, protocols and social networking 
apps. Townsend begins a conversation, but 
we owe it to ourselves to develop a quanti- 
tative, integrated science of cities to guide 
our vision of how we will grow, govern, live 
and work in tomorrow’s smart cities. 


their parts.” 


Melanie E. Moses is an associate 
professor of computer science and biology 
at the University of New Mexico and 
external faculty of the Santa Fe Institute. 
e-mail: melaniem@cs.unm.edu 


QUANTUM PHYSICS 


Packet man 


Graham Farmelo delights in a study of Albert Einstein’s 
under-appreciated contributions to quantum theory. 


n 1941, US physi- Einstein and 
cist John Wheeler the Quantum: 
visited Albert Ein- The Quest of the 


Valiant Swabian 
A. DOUGLAS STONE 
Princeton University 
Press: 2013. 


stein, the arch quan- 
tum sceptic, at his 
home in Princeton, 
New Jersey. Wheeler 
was hoping that the beauty of the new version 
of quantum theory developed by his brilliant 
student Richard Feynman would persuade 
Einstein to accept that the theory was simply 
a natural development of well-founded clas- 
sical ideas. The sage of Princeton listened in 
silence as Wheeler set out his case, but after- 
wards was no more enthusiastic. “Of course, 
I may be wrong,’ he said, “but perhaps I have 
earned the right to make my mistakes.” 

Einstein was by that time a semi-detached 
member of the physics community, admired 
much less for his current work than for his 
achievements. Many of his colleagues thought 
his views on quantum theory cranky — Rob- 
ert Oppenheimer dismissed them as “cuckoo”. 
That opinion is sometimes echoed today in 
popular books, many of which underestimate 
his contributions to the theory. 

In Einstein and the Quantum, Douglas 
Stone attempts to put that right. He describes 
Einstein's work on the theory using few equa- 
tions, combining scientific and biographical 
accuracy with wide accessibility. Stone, a 
distinguished condensed-matter physicist at 
Yale University in New Haven, Connecticut, 
brings a wealth of physical insight and — less 
predictably — an impressive familiarity with 
the work of leading Einstein scholars. 

In 1900, Max Planck introduced the revo- 
lutionary idea of energy quantization in the 
interaction between matter and radiation in 
black bodies. But, as Stone explains, it was 
Einstein who first understood the implica- 
tions. In 1905, the 26-year-old physics wizard 
radically suggested that the energy of electro- 
magnetic radiation is transferred in the dis- 
crete amounts that Planck called quanta. For 
physicists of the day, long familiar with James 
Clerk Maxwell’s wave description of light, 
Einstein's notion was beyond heretical. Few 
leading theoreticians took it seriously, least 
ofall Planck. 

Even Einstein wavered. He strove for years 
to understand radiation quanta, for exam- 
ple by tinkering with Maxwell’s equations of 
electromagnetism. Eventually he abandoned 
this approach, having introduced the useful 
but murky concept of wave-particle duality. 
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Albert Einstein at his home in Berlin. 


Yet, more than any other scientist, Einstein 
ran with the quantum idea. Applying it to the 
vibrational energies of atoms, he used it to 
predict that the specific heats of solids should 
vanish as the temperature is lowered towards 
absolute zero. Quoting an early statement of 
Einstein's about atomic energy, Stone adds 
with characteristic pith that energy quantiza- 
tion “is not a mathematical trick; it is the way 
of the atomic world. Get used to it.” 

Each of the 29 chapters in Einstein and the 


UNDERWOOD & UNDERWOOD/CORBIS 


Quantum is brief, pacey and lucid (although 
some titles are perhaps too clever: for exam- 
ple, ‘Stalking the Planck’). The breadth and 
depth of Einstein’s contribution in this area 
becomes overwhelmingly clear. Eleven years 
after his first great paper on the subject, he 
delivered a theory of transitions that intro- 
duced into quantum theory the idea of prob- 
abilities, which he came to despise. Finally, in 
1924, he built on the thinking of Indian physi- 
cist Satyendra Bose about quantum gases and 
predicted that, under some conditions, a high 
proportion of particles could occupy the low- 
est quantum state, enabling quantum effects 
to appear in the everyday world. This was 
later called Bose-Einstein condensation and 
was first observed experimentally in 1995. 

Stone covers all this with clarity and even 
tackles Einstein’s little-known 1917 paper 
on the quantization of chaotic systems. This 
chapter will probably leave non-specialists 
scratching their heads, but it is worth a read 
because it demonstrates that there is more to 
Einstein’s oeuvre than even most quantum 
physicists know. Stone concludes that Ein- 
stein’s work was worthy of four Nobel prizes, 
and it is a measure of the book’s achievement 
that his claim sounds quite reasonable. 

It was left to Werner Heisenberg, Erwin 
Schrodinger and Paul Dirac to set out the 
full-blown quantum theory of matter in the 
mid-1920s. Einstein was a formidable critic 
of the theory, although he was always outwit- 
ted in argument by his friend Niels Bohr — a 
topic treated only briefly in the book, prob- 
ably because this ground is so well-trodden. 
Yet all the originators were indebted to Ein- 
stein’s thinking. As Max Born later said, he 
was “clearly involved in the foundation of 
wave mechanics and no alibi can disprove it”. 

In old age, Einstein seemed indifferent to 
his reputation as a fuddy-duddy, but the criti- 
cisms may have hurt more than he let on. I 
have often wondered how he felt when he saw 
the Princeton University Players’ production 
of William Shakespeare's The Tempest in July 
1953, especially when Prospero contemplates 
the fleeting nature of existence that leaves 
“not a rack behind” Einstein died less than 
two years later. He was proud to have built the 
great edifice of relativity, but still profoundly 
dissatisfied with quantum theory, which he 
was confident would be superseded. 

Was he wrong? Some theoretical physi- 
cists are now speculating that space and time 
might in some sense emerge from the more 
fundamental quantum, so it may be that sci- 
entists will one day regard Einstein’s great- 
est achievement as pioneering a theory he 
believed was terribly flawed. In the meantime, 
Stone’s rewarding book helps us to appreciate 
the remarkable extent of that feat. m 


Graham Farmelo is a By-Fellow at Churchill 
College, University of Cambridge, UK. 
e-mail: graham@grahamfarmelo.com 


Books in brief 


Junkyard Planet: Travels in the Billion-Dollar Trash Trade 

Adam Minter BLOOMSBURY (2013) 

Junk really is filthy lucre — the basis of a global scrap trade 

worth up to US$500 billion a year, writes Adam Minter. Scion of a 
professional recycling family, Minter anatomizes this complicated, 
half-hidden industry that he argues is, even at its dirtiest, greener 
than harvesting raw resources. He focuses on scrap metal, a prized 
commodity now recycled in innovative ways, and the kingpins of the 
trade. Leonard Fritz, for instance, rose from extreme poverty to run 
the Michigan-based Huron Valley Steel Corporation, which annually 
processes almost half a million tonnes of shredded automobile. 


Cut It Out: The C-Section Epidemic in America 

Theresa Morris NEW YORK UNIVERSITY PRESS (2013) 

Birth by Caesarean section is expensive and carries a higher risk 
of medical complications than vaginal birth. Yet in 2011, 33% 

of US births were by Caesarean. To investigate why, sociologist 
Theresa Morris crunched the numbers and interviewed more than 
100 medical staff and mothers. The culprit, she concludes in this 
excellent and detailed study, is a risk-averse US medical culture that 
favours heavily managed births — such as the overzealous use of 
fetal heart monitors, which restrict the mother’s movement — and 
that frowns on women having vaginal births after Caesareans. 


Shores of Knowledge: New World Discoveries and the Scientific 
Imagination 

Joyce Appleby W. W. NorRTON (2013) 

Asea change gripped Europe from the late 1400s as word of the 
thrillingly strange New World spread. Maps were redrawn and the 
‘book of nature’ swelled with new species, from penguins to chillies. 
In a history stretching from Christopher Columbus to Charles Darwin, 
Joyce Appleby reveals how a thirst for empiricism grew with the need 
to sift out tall tales from genuine reportage. She treads the trail of 
paper and specimens left by the likes of ethnographer Bernardino de 
Sahagun and “first ecologist” Alexander von Humboldt. 


To the Letter: A Journey Through a Vanishing World 

Simon Garfield CANONGATE (2013) 

The letter — that pillar of the historical record — may itself soon be 
history. As Simon Garfield reminds us in this elegy to the post, letters 
uniquely revivify past eras and the psychological complexities of 
people living through them. The first stirrings and exponential rise 
of e-mail are touched on, but Garfield’s focus is the physical missive 
and the depth of thought it allows. From wooden tablets dug up at 
the ancient Roman garrison Vindolanda, UK, to the epistolary gems 
of novelist Virginia Woolf, this is a billet-doux to two millennia of the 
impassioned, often life-changing power of private correspondence. 


Survive! Inside the Human Body, Vol. 1: The Digestive System 
Gomdori co., Suk-young Song and Hyun-dong Han No STARCH PRESS 
(2013) 

From volcanic burps to colonic bacteria, this comic-book ride 
through the human digestive system is a delirious joy for pretty 
much everyone aged eight and over. Hyun-dong Han’s lurid images 
and zippy text by Suk-young Song deliver on facts even as they 
shamelessly milk the ‘yuck’ factor. Take the plunge with hero Geo and 
“self-proclaimed genius” Dr. Brain as they shrink and are sucked into 
the ever-hungry Phoebe: the ultimate inside story. BarbaraKiser 
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The eruption of Mount Pinatubo in the Philippines in June 1991 cooled the globe. 


Wiel hart" 


One cool solution 


Nicola Jones finds a treatise on a proposed global- 
warming fix intriguing, but isn’t converted to the cause. 


business jets have been converted to a 

new purpose: dispensing 250,000 tonnes 
of sulphates a year into the lower strato- 
sphere. Here the chemicals form sunlight- 
reflecting droplets that cool the planet below. 
This artificial volcanic haze, costing about 
US$700 million a year, helps to mitigate the 
warming effect of greenhouse gases. 

This is the vision of David Keith, a physi- 
cist and self-described “oddball environmen- 
talist”. His company, Carbon Engineering, is 
currently pursuing a device that can suck car- 
bon dioxide out of the air (see Nature 458, 
1094-1097; 2009). Now, in A Case for Climate 
Engineering, Keith considers the partner 
concept of cooling the world by deflecting 
sunlight. Unlike journalist Eli Kintisch’s 2010 
book Hack the Planet, this is not an explo- 
ration of all geoengineering schemes told 
through compelling stories and characters. 
Rather, it is a straightforward presentation 
of one controversial, planet-altering idea — 
along with an intriguing discussion of the 
philosophical roots of environmentalism. 

Mopping up carbon dioxide, as Keith’s 
company aims to do, is expensive but carries 
no global risk. Conversely, sulphate spraying 


I is 2030, and a fleet of ten Gulfstream 


A Case for Climate 
Engineering 

DAVID KEITH 

MIT Press: 2013. 


is very cheap: the cost 
of a decade's worth of 
Sun-blocking might 
be just a few billion 
dollars, compared to the $300 billion being 
spent per year globally on clean-energy 
technologies, writes Keith. But the possi- 
ble downsides range from disrupting rain- 
fall and interfering with the ozone layer to 
creating acid rain and removing political 
incentives to reduce emissions. 

Keith is a member of the ‘geo-clique; a 
small group of researchers who have spent 
serious time considering such schemes. His 
work has earned him everything from being 
named a 2009 ‘Hero of the Environment’ by 
Time magazine to outrage from colleagues — 
and even death threats. Geoengineering has 
been labelled crazy: geophysicist Ray Pierre- 
humbert called it “barking mad” And just last 
month, former Royal Society head Martin 
Rees dubbed it “an utter political nightmare”. 

Sulphates are certainly effective coolants. 
Atmospheric sulphur dioxide from the 
1991 eruption of Mount Pinatubo cooled 
the globe by 0.5 °C in less than a year. Since 
the 1970s, a few scientists have explored the 
logistics and risks of mimicking this process. 
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It has become apparent, for example, that 
counteracting all greenhouse-gas warm- 
ing with sulphates would result in a world 
with less rainfall. Keith’s group has found, 
however, that countering just a fraction of 
the world’s warming with sulphates could 
mitigate the rainfall changes caused by our 
current emissions of greenhouse gases. For 
this reason, among others, Keith prefers to 
investigate a Sun-shading scheme that aims 
to counter just half of the world’s temperature 
rise (for example, reducing a catastrophic rise 
of 4 °C to 2 °C, the “upper bound of accept- 
able warming”), and only for a limited time. 

But there are problems. The cooling action 
of sulphates doesn't follow the same pattern 
as the warming action of carbon dioxide: 
sulphates are more effective during the 
day and in summer, for example, whereas 
carbon dioxide acts all year round. Nor can 
sulphates counteract ocean acidification. 
And the resulting air pollution would proba- 
bly cause thousands of deaths each year from 
asthma, heart disease and lung cancer, writes 
Keith. There are few facts and figures in the 
book with which to weigh up these dangers: 
the risk of acid rain, for example, he simply 
dismisses as “relatively small” compared to 
the hazards of rapid climate change. 

Strangely unmentioned in Keith’s book is 
the United Nations ban against geoengineer- 
ing. In 2008, the United Nations Convention 
on Biological Diversity enacted a morato- 
rium against iron fertilization — the idea of 
sprinkling iron into the ocean to feed phyto- 
plankton and thus soak up atmospheric car- 
bon dioxide. This ban was extended in 2010 
to all temperature-fiddling geoengineering 
projects, including Sun-blocking schemes. 

The moratorium, which is intended to pre- 
vent commercial projects or large-scale efforts 
while allowing for small-scale experimenta- 
tion, has been used by protestors to try to 
block all iron-fertilization experimentation. 
Keith’s book might be a pre-emptive strike 
against things going the same way for solar 
geoengineering; he and his colleagues are 
contemplating small-scale experiments now. 

Keith’s goal with this book, he writes, is 
to convince us that deciding whether or not 
to pursue geoengineering is a “hard choice’, 
not an easy decision based on instinctive 
revulsion. He succeeds, but this is a low bar. 
Where he fails is in convincing us to agree 
with him that “deliberate haste’ is called for. 

General readers will probably find A Case 
for Climate Engineering a bit dry; academics 
might judge it too light. But it should stand 
as an enticing starter for the researchers who 
Keith hopes to rope into the field, thereby 
expanding the ‘geo-clique’ and helping to 
push the science into the mainstream. = 


Nicola Jones is a freelance science journalist 
based near Vancouver, Canada. 
e-mail: nkjones@gmail.com 


ARLAN NAEG/AFP/GETTY 


Correspondence 


No bias behind 
pollinator research 


We disagree with Ian Boyd's 
implication that bias may have 
influenced the commissioning 
and publication of research on 
pollinator declines (Nature 501, 
159-160; 2013). 

Our paper on falls in European 
bee-species richness (J. C. 
Biesmeijer et al. Science 313, 
351-354; 2006), along with 
others on honeybee colony 
collapse and bumblebee declines, 
prompted widespread public 
concern. Subsequent decisions 
in continental Europe and the 
United Kingdom to commission 
further research in this area 
therefore seemed sensible and 
proportionate. 

These calls for research 
used “pollinator declines” as a 
convenient shorthand, not to 
steer the work. This is borne 
out by results from the studies 
funded, including our own, 
indicating that past declines 
in some pollinator groups 
may have recently slowed or 
even partially reversed (L. G. 
Carvalheiro et al. Ecol. Lett. 16, 
870-878; 2013). 

Publication bias undoubtedly 
occurs, but it can be identified 
only by reviewing whole fields, 
not individual papers. This 
should be addressed as part ofa 
systematic review when policy 
issues arise, as carried out by the 
UK’s Parliamentary Office of 
Science and Technology or by 
independent research teams. 

We agree that there are 
uncertainties in our conclusions, 
as Boyd suggests; indeed, 
our papers list strong caveats 
pertaining to our data sets and 
methodology, which were largely 
ignored by the media. 

A national pollinator- 
monitoring programme, 
recommended recently ina 
parliamentary report, would 
provide much more robust 
estimates of pollinator trends in 
future. 

William E. Kunin* University of 
Leeds, UK. 
w.e.kunin@leeds.ac.uk 


*On behalf of 5 co-signatories (see 
go.nature.com/setpu6 for full list). 


Genetic engineering 
in conservation 


Species bearing genetically 
engineered adaptive variants 
that are intended to save them 
from extinction might differ 

in important respects from the 
original species designated for 
protection — with unpredictable 
ecological consequences (see 

M. A. Thomas et al. Nature 501, 
485-486; 2013). 

Introducing adaptive variants 
by genetic engineering might 
work for some long-lived plants 
in which disease resistance is 
primarily due to a single gene 
J. M. Adams et al. Conserv. Biol. 
16, 874-879; 2002), and for 
economically important traits 
in agricultural crops grown 
in controlled environments. 

In wild endangered species, 
however, identifying ‘missing’ 
adaptive single-gene variants 
and increasing their frequency 
without causing negative side 
effects is almost certain to prove 
impossible. 

Genetically based inferior 
fitness in endangered populations 
— including in the Florida 
panthers and Swedish vipers cited 
by Thomas and his colleagues — 
has been traced to an increased 
frequency of detrimental alleles 
from inbreeding or a loss of 
genetic variation, and not toa lack 
of adaptive variants. 

Improving connectivity 
with outside populations 
would rescue the fitness of 
such endangered populations 
by introducing greater genetic 
variation, non-detrimental 
variants, and adaptive alleles that 
have already been well tested by 
evolution. 

Philip W. Hedrick Arizona State 
University, Tempe, USA. 
philip.hedrick@asu.edu 

Fred W. Allendorf University of 
Montana, Missoula, USA. 

Robin S. Waples National 
Marine Fisheries Service, Seattle, 
Washington, USA. 


Safeguard species in 
warming flatlands 


To protect the biodiversity of 
flatlands against the effects of 
climate change (M. Tingley 

et al. Nature 500, 271-272; 
2013), we need strategies to 
buy time for species to adapt 
to warmer environments or to 
move to cooler ones. This will 
mean adding more protected 
areas in cool regions and 
improving connectivity between 
protected sites. 

One way to increase resilience 
among resident communities 
would be to reduce the intensity 
of summer grazing on flatlands. 
Shade from tall, dense swards 
helps to cool the soil by up to 
5°C (J. A. Thomas et al. Science 
325, 80-83; 2009), an effect that 
is enhanced as the land becomes 
more uneven (J. Settele and E. 
Kiihn Science 325, 41-42; 2009). 

Conservation measures 
in existing protected sites, 
as in Europe’s Natura 2000 
programme (see go.nature.com/ 
ykf7vt), remain important but 
may prove inadequate on their 
own and will need to be adapted 
and revivified as the climate 
warms. 

Josef Settele, Ingolf Kiihn 
Helmholtz Centre for 
Environmental Research, Halle, 
Germany. 

josef.settele@ufz.de 

Jeremy A. Thomas University of 
Oxford, UK. 


Metaphors advance 
scientific research 


Asa former collaborator in 
Eleonore Pauwels’ research on 
the misuse of metaphors by 
synthetic biologists, I agree with 
many of her points but find 
her perspective too restrictive 
(Nature 500, 523-524; 2013). In 
my view, the use of analogies, 
concepts and metaphors is 
crucial for advancing scientific 
research. 

Pauwels tends to merge 
metaphors with analogies 
and theoretical concepts. Her 


examples of oscillators, switches 
and logic gates, which have a 
precise meaning in engineering, 
are better viewed as the 
analogical transfer of a scientific 
concept (see also B. Calcott 
Nature 502, 170; 2013). To treat 
them as though they were on 
a par with expressions such as 
‘selfish gene, ‘software of life’ or 
‘household of nature’ does not 
capture the ways in which they 
are used in scientific practice. 
Metaphors and analogies have 
long driven cross-disciplinary 
exchange. For example, the early 
mathematization of biology and 
economics in the nineteenth and 
twentieth centuries was largely 
built on analogies with physics. 
Analogies and metaphors have 
also contributed substantially 
to ideas developed in cognitive 
science and in the philosophy 
and history of science (reviewed 
in J. Maienschein et al. Isis 99, 
341-349; 2008). 
Andrea Loettgers University of 
Geneva, Switzerland. 
andrea. loettgers@unige.ch 


Keep PubMed 
running at all costs 


With more than 23 million 
citations to date from 
MEDLINE, life-science 
journals and online books, 
the giant National Institutes 
of Health database PubMed 
is arguably the most valuable 
tool available to biomedical 
scientists. Its vulnerability 
has been highlighted by this 
month’ partial US government 
shutdown, with only minimal 
updates and maintenance to 
PubMed possible. 

To avoid lapsing into another 
dark age of research, the ongoing 
maintenance of PubMed must 
be guaranteed. We urge the 
scientific community to push for 
PubMed to be entirely supported 
and commissioned by an 
international forum. 

Alex W. Hewitt, David A. 
Mackey Lions Eye Institute, 
Perth, Australia. 
hewitt.alex@gmail.com 
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OBITUARY 


David Barker 


(1938-2013) 


Epidemiologist who traced roots of chronic disease to early life. 


avid Barker was one of the most 
Dine clinical epidemiologists 

of our time. He challenged the idea 
that chronic disorders such as diabetes and 
cardiovascular disease are explained only by 
bad genes and unhealthy adult lifestyles. His 
‘Barker hypothesis’ proposed that the fetal 
environment and early infant health 
permanently programme the body’s 
metabolism and growth, and thus 
determine the pathologies of old age. 
Initially controversial, his ideas trig- 
gered an explosion of research world- 
wide into the relationship between 
early development and adult disease. 

Barker died suddenly from a cer- 
ebral haemorrhage on 27 August, 
aged 75. Born in 1938, he was educated 
at Oundle School, UK. His biology 
teacher encouraged him to roam the 
countryside hunting for beetles, and 
gave him access to the labs to classify 
his finds. When Barker left school, he 
led a project on the Icelandic island of 
Grimsey to collect plant specimens 
for the Natural History Museum 
in London. 

During his medical training at 
Guy’s Hospital in London, he took 
a year out to complete a bachelor’s 
degree in physical anthropology, 
comparative anatomy, embryology 
and mammalian biology. He stud- 
ied under the eminent zoologist 
J. Z. Young, and published his first 
paper, on the effects of testosterone 
on bone density, in Nature in 1962. 

In 1966, Barker completed his 
PhD thesis on prenatal influences 
and subnormal intelligence, a forerunner to 
his later work on fetal programming. With 
a Medical Research Council (MRC) grant, 
he worked in Uganda on Mycobacterium 
ulcerans infection (Buruli ulcer). When the 
country descended into crisis in the 1970s 
under President Idi Amin, Barker, fear- 
ing for his family’s safety, fled through the 
night with his wife and four young children 
into neighbouring Kenya. By then, he had 
done enough research to link the trans- 
mission of Buruli ulcer to wounds caused 
by the razor-sharp reeds growing near 
the river Nile, proving that it was not an 
insect-borne disease. 

In 1972, Barker joined the University of 
Southampton, UK, where he remained for 
the rest of his career. An inspiring teacher, 


he set up an annual course on epidemiology 
for clinicians, with fellow epidemiolo- 
gist Geoffrey Rose. The course remains 
the definitive introduction to the field for 
researchers. In 1984, Barker became director 
of the MRC Environmental Epidemiology 
Unit in Southampton. 


The unit’s detailed mapping of mortal- 
ity trends in England and Wales led to his 
observation that areas with the highest 
infant mortality in 1910 had the highest rate 
of cardiovascular deaths in the 1970s. With 
his lifelong research partner, Clive Osmond, 
Barker developed his hypothesis that an 
adverse environment in the womb, and 
during infancy, was causally linked to 
chronic-disease risk later in life. 

He devoted the next three decades of his life 
to the pursuit of this idea. And through vari- 
ous and diverse collaborations, Barker made 
many significant advances. With colleagues at 
his MRC Unit and in Cambridge, he showed 
that people of lower weight at birth and in 
infancy were more prone to cardiovascular 
disease, hypertension and diabetes in middle 
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age. Collaborating with researchers from the 
Helsinki Birth Cohort study, which tracks the 
long-term health impacts of early growth, he 
linked patterns of childhood growth to these 
conditions. And with colleagues in India, he 
showed similar relationships in developing- 
world populations. His work also formed the 
basis for linking the until then separate 
worlds of fetal physiology and epide- 
miology, bringing together physiolo- 
gists from Australia, New Zealand 
and Canada who were studying fetal 
development in animals. Ultimately, 
Barker published 500 research papers 
and 10 books, and had numerous 
honours to his name. 

David never retired. After stepping 
down as director of the Southampton 
unit in 2003, he continued to work 
at what is now called the MRC Life- 
course Epidemiology Unit, and to 
contribute to the studies that he estab- 
lished — the Hertfordshire Cohort 
Study and the Southampton Women’s 
Survey. He also helped to set up the 
Southampton Initiative for Health to 
find practical ways of improving the 
diets of low-income mothers in the 
United Kingdom. In his last few years, 
he turned his focus to the placenta 
as the channel of communication 
between mother and fetus, and spent 
several months a year working at the 
Oregon Health and Science Univer- 
sity in Portland — a leading placental- 
research centre — and at Emory 
University in Atlanta, Georgia, work- 
ing on the biology of human growth. 

Of all his characteristics, I, like 
other colleagues, will most fondly remember 
David’s humour; he was a brilliant raconteur 
and often sought as an after-dinner speaker. 
He was a private, thoughtful and caring man, 
for whom family life was central. After the 
death of his first wife, Angela, he married 
Jan in 1983. Their home housed four gen- 
erations, and became a centre for scientific 
work, to which they welcomed visitors from 
around the world. = 


Cyrus Cooper is director of the Medical 
Research Council Lifecourse Epidemiology 
Unit, University of Southampton, UK. 
During the 1980s, he worked as a clinician 
for David Barker, who supervised his 
doctoral research at the MRC Unit. 

e-mail: cc@mrc.soton.ac.uk 
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FORUM Genomics 


Compar isons across cancers 


Analysis of cancer genomes is moving beyond the confines of a particular disease — researchers are now comparing the 
genetic and epigenetic characteristics of multiple tumour types. Two scientists comment on what such studies can teach 
us about cancer biology and how they may guide clinical practice. SEE ARTICLE P.333 


THE PAPER IN BRIEF 

@ Research networks around the world 
are cataloguing DNA mutations, chemical 
changes to DNA-associated proteins, and 
expression of RNA transcripts and proteins in 
thousands of human tumours. 

@ Inaseries of 16 papers’, one of which 

is published on page 333 of this issue 
(Kandoth et al.)*, The Cancer Genome 
Atlas (TCGA) Research Network presents 
comparisons of such data across as many 


Order from 
disorder sprung 


ALAN ASHWORTH 


uring the past few years, the advent of 

hugely powerful DNA-sequencing tech- 
nologies has delivered unprecedented insight 
into the nature of cancer genomes’. Hundreds 
of examples of genomes from several cancer 
types have already been produced, and this 
process will continue so that a definitive 
overview of cancer genomics can eventually 
be achieved. Nevertheless, it seems apposite 
to take stock of the themes that are emerging 
from comparisons of the genomes of different 
tumour types” — studies that are giving usa 
fascinating first peek at the common muta- 
tional events and processes that shape cancer 
genomes. 

The first impression that emerges from 
these comparisons is of the tremendous vari- 
ation. Some types of cancer have, on average, 
relatively few genetic changes, whereas others 
show extraordinary mutational complexity. 
It is likely that most mutations in cancer 
genomes represent collateral damage that is 
unrelated to pathogenesis, but studies seeking 
candidates for driver mutations — those that 
contribute to the disease state* — are reveal- 
ing that both the number and nature of these 
candidates also differ considerably between 
cancers’. In some cases, we are seeing dis- 
tinct cancer types with alterations in the same 


as 12 tumour types (Fig. 1). 

@ The publications join other pan-cancer 
efforts in revealing commonalities 
between all cancer types, shared 
molecular abnormalities in tumours 

that superficially seem distinct, and 
mutations that are confined to specific 
tumours. 

@ The findings will guide the development 
of prognostic, diagnostic and therapeutic 
strategies. 


cellular pathway brought about through driver 
mutations in different genes. 

Mutual exclusivity of mutations in genes or 
pathways is also becoming apparent”, provid- 
ing clues as to which genes or pathways have 
non-redundant roles in oncogenesis. Using 
such data, we may eventually be able to under- 
stand the totality of biological perturbations 
that, acting together, result in the phenotypic 
diversity of human cancer. There is also the 
potential to deconvolve the order in which 
pathways are altered during disease progres- 
sion, which is likely to be non-random owing 
to genetic interactions®. Gaining understand- 
ing of these two issues may be key to successful 
prevention and treatment strategies. 

Comparing the type and frequency of 
genetic alterations, and the overall genomic 
structure, in different tumour classes also 
gives insight into the underlying mutational 
processes at play*. The accumulation of muta- 
genic cellular processes, endogenous and envi- 
ronmental exposures, and DNA-repair defects 
over many years or decades results in genomic 
‘scars’ that can help us to understand the cause 
of the disease in an individual. The mutagenic 
fingerprints of tobacco smoking and sunlight 
exposure, for example, are obviously mani- 
fest in some cancers, but new phenomena are 
also being described and neologisms coined 
to describe them, such as chromothripsis for 
the shattering of individual chromosomes? 
or kataegis for discrete genomic regions 
peppered with mutations’. Many other pre- 
viously unknown mutational processes also 
seem to be involved in the development of 
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particular cancers. Studying these may reveal 
other influences on cancer development’. 

An enormous amount has already been 
gleaned from these initial analyses, but much 
remains to be done. First, there is a strong case 
for completing a comprehensive and detailed 
survey of the entire panoply of human cancers. 
Paradoxically, rather than increasing complex- 
ity, this should allow further common themes 
to emerge from the noise. Second, most 
analyses of driver mutations have focused on 
protein-coding regions, which comprise only 
about 1% of the human genome. But it seems 
probable that studying non-coding regions will 
reveal a wealth of cancer-related mutations. 
Third, epigenetic alterations to the gnome — 
which affect gene expression without changing 
the underlying DNA sequence — that cause 
or occur during cancer development need 
to be integrated into this landscape. Fourth, 
most of the tumours studied so far have been 
primary cancers before treatment; metastatic 
and treatment-resistant genomes also need to 
be studied in detail. Last, several studies have 
highlighted the genetic variation between 
individual cells within a tumour, and further 
analysis is needed to ascertain the prevalence 
of this phenomenon. 


Alan Ashworth is at the Institute of Cancer 
Research, London SW7 3RP. UK. 
e-mail: alan.ashworth@icr.ac.uk 


A clinical 
perspective 


THOMAS J. HUDSON 


lassifying cancers using a broad, cross- 

tumour perspective provides not only 
biological insight, but also clinically relevant 
information. The value of the pan-cancer 
approach is demonstrated by Kandoth and col- 
leagues’ study’ focusing on the simplest forms 
of mutation — single-nucleotide substitutions, 
or insertions or deletions of a few nucleotides, 
in the sequences of protein-coding genes. By 
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Figure 1 | Pan-cancer analysis. The Cancer Genome Atlas Research Network has presented a series’ of 
initial findings from comparisons of the tumour characteristics and clinical data of thousands of patients, 


covering 12 major types of cancer. 


applying stringent statistical tests based on the 
recurrence rates of such mutations, the authors 
identify 127 genes that are significantly mutated 
in a combined analysis of 3,281 tumours rep- 
resenting 12 tumour types. Although many 
genes in the list have previously been reported 
as mutated in cancer, the occurrence of these 
mutations across a wide range of cancers has 
not been appreciated until now. 

Kandoth et al. also investigated these 
127 genes as indicators of disease prognosis, 
using clinical data collected by the TCGA™, 
such as time to disease recurrence and time 
to death. Although survival analyses across 
cancer types are made difficult by the hetero- 
geneity of clinical features related to different 
tumours, such as age of presentation, treat- 
ment modalities or metastatic potential, the 
large size of the study gave sufficient power 
to reveal several prognostic correlates. For 
example, mutations in several genes, includ- 
ing BAP1, DNMT3A, KDMS5C, FBXW7 and 
TP53, were found to correlate with poor prog- 
nosis, whereas mutations in two genes, BRCA2 
and IDH1, often correlated with improved 
prognosis. 

It is worth noting that multi-tumour analy- 
ses can miss biomarkers that are prognostic 
indicators in single tumour types (such as 
KDM6A and ARID1A in bladder cancer), 
affirming the value of analysing data at both 
the individual tissue-type and pan-cancer 
levels. However, if the prognostic significance 
of pan-cancer genes is validated in large pro- 
spective studies of patients with cancer, clinical 
assessments of these genes may help to iden- 
tify patients at higher risk of metastatic relapse, 
who could benefit from adjuvant therapies. 
This strategy has already been applied in 
patients with early-onset breast cancer through 


the use of multi-gene expression profiles’”””. 


In future, it will be useful to correlate genes 
identified as being relevant to multiple 
cancer types with drug responses, although 
this information will require greater integra- 
tion of genomic profiles in clinical trials and 
cancer registries” , and new models of data 
sharing among research institutions”. 

How can this pan-cancer project be 
exploited in drug development? One way is 
through the ranking of drug targets, which 
can be used to prioritize drug-development 
projects. More important, however, is the iden- 
tification of functional relationships between 
groups of genes, or pathways. Pharmacologi- 
cal modulation of such pathways provides an 
alternative route for drug development when 
candidate genes encode proteins that are not 
deemed to be appropriate drug targets. Several 
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pathways implicated recently by other cancer- 
genome projects (such as pathways involved 
in RNA splicing, transcription regulation 
and metabolism) have been confirmed in 
the pan-cancer analysis, reinforcing the idea 
that these pathways should be considered as 
therapeutic targets. 

We should anticipate many more surprises 
as additional tumour types'*, mutation catego- 
ries (including those in non-coding regions of 
the genome) and functional annotations of 
genomes'® are integrated in the next genera- 
tion of pan-cancer studies. The determination 
of the common denominators — and the out- 
liers — in cancer has the potential to benefit 
patients through improved laboratory tests, 
new drug-development opportunities and 
better-informed treatment decisions. m 
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Combs for molecules 


Lasers known as frequency combs have been used to generate molecular spectra 
from samples within microseconds and with high spatial resolution. This offers 
fresh prospects for making microscopy observations in real time. SEE LETTER P.355 


YARON SILBERBERG 


see specific molecules, or to analyse the 

molecular content ofa sample. This dream 
has driven much of the work in advanced 
microscopy in recent years, particularly 
towards instruments that can be tuned to 
detect molecules through their specific vibra- 
tion frequencies’. In this issue, Ideguchi et al.* 


[== a microscope that can be tuned to 


report a crucial step in this direction: a micro- 
scopy method that uses light sources known as 
frequency combs. The work resolves a problem 
that has hindered the use of a technique called 
impulsive Raman spectroscopy in microscopy. 

Molecules can be thought of as atoms 
connected by springs, and so their vibra- 
tions depend on their precise structure. This 
means that each molecule is characterized by 
a specific set of vibration frequencies, which 
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generate a ‘fingerprint’ that can be detected 
using Raman spectroscopy. 

One simple way to induce vibrations in a 
molecule is to irradiate it with an intense pulse 
of light. Because molecular vibrations typically 
oscillate with periods of a few tens of femto- 
seconds (1 femtosecond is 10° seconds), the 
pulse should be no longer than this. To identify 
the molecule, a second pulse of light is then 
needed. This pulse may enhance or attenuate 
the vibration, depending on the delay between 
the two pulses — that is, depending on when 
the second pulse arrives relative to the molec- 
ular oscillation cycle. More importantly, the 
vibrating molecules slightly shift the frequency 
of the delayed light pulse up or down, depend- 
ing on when exactly it arrives. This spectral 
shift is easy to detect. 

To determine the molecule’s vibration fre- 
quency, a series of pairs of pulses is used in 
which the delay between pulses is systemati- 
cally varied. This causes a periodic change in 
the output spectrum, from which the molecu- 
lar vibration frequency can be calculated. If the 
molecule vibrates at several frequencies, or if 
the sample contains a mixture of materials, the 
frequency-shift pattern is more complex, but a 
simple mathematical analysis can extract indi- 
vidual frequencies, thereby allowing molecules 
to be identified. 

This spectroscopic method, known as 
impulsive Raman scattering, has been around 
for a few decades*®. It has even been adapted 
for microscopy’, but here it suffers from a 
major drawback: measurements must be per- 
formed repeatedly at every point in a sample, 
with each picture element being acquired 
quickly enough for a whole image to be 
obtained in a reasonable time frame. In all pre- 
viously reported methods, the delay between 
pulses was varied by mechanically moving 
a mirror. Such back-and-forth mechanical 
motions ofa mirror cannot be done faster than 
a few times per second, and therefore acquir- 
ing a complete image with a sufficient num- 
ber of picture elements would take too long for 
most applications. 

Enter the wizardry of frequency combs. 
These are lasers that produce a train of ultra- 
short pulses at a highly precise rate, typically 
around 100 million pulses per second. They 
generate a precise ‘comb of optical frequen- 
cies and so have revolutionized atomic 
spectroscopy, which requires exquisite fre- 
quency control®. Indeed, Theodor Hinsch, 
one of the co-authors of Ideguchi and col- 
leagues’ paper, was awarded the 2005 Nobel 
Prize in Physics for the development of these 
light sources. 

Frequency combs are not obviously suited 
to molecular spectroscopy and microscopy, 
because these fields do not usually require the 
sensitive frequency control of atomic spectro- 
scopy. But Ideguchi and co-workers show that 
frequency combs can be harnessed to speed up 
impulsive Raman scattering so that it can be 
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Figure 1 | Frequency combs for molecular spectroscopy. a, Ideguchi et al.’ have used two laser sources 
(known as frequency combs) that have different pulse-production rates to generate pairs of light pulses 
(vertical lines) so that the time between pulses varies. b, They used these sources to excite molecular 
vibrations in a sample (not shown). The red line indicates the amplitude of a molecular vibration 
stimulated by a pulse from laser 1 (dark-green line) over time; the light-green line indicates the arrival 
time of a pulse from laser 2. Vibration duration is not shown to scale; vibrations last for only a small 
fraction of the time between consecutive laser pulses. c, Vibrations excited by the first pulse shift the 


spectrum of the second pulse in different directions, 


depending on the point of the vibration cycle at 


which the second pulse arrived. The researchers analysed the spectral shifts to determine the frequency of 


the molecular vibrations. 


used for microscopy. The authors combined 
light from two such lasers, which produce 
pulses at slightly different repetition rates; the 
delay between pulses from each laser therefore 
varies (Fig. 1). This approach to varying the 
delay requires no moving parts, and the rate of 
variation can be easily controlled by manipu- 
lating the lasers’ pulse-production rates. 

By using just such a combined laser source 
and monitoring the output, Ideguchi et al. were 
able to acquire the vibration Raman spectra of 
samples in just a few microseconds — thou- 
sands of times faster than is possible with a 
mechanically based system, and certainly fast 
enough to apply the technique to microscopy. 
They recorded these spectra with sufficient 
resolution and signal-to-noise ratios to iden- 
tify the constituents of various mixtures of 
organic solvents, and to produce molecular 
images using a simple microscopy set-up. 

The authors’ system suffers from one major 
limitation: there is a long ‘dead time’ for each 
measurement. Molecular vibrations die out 
after a few picoseconds (1 picosecond is 
10°” seconds), but scanning continues for up 
to a few nanoseconds, as dictated by the repeti- 
tion rates of the lasers. In other words, data are 
collected during only about one-thousandth of 
the whole measurement time. And not only is 
most of the measurement time wasted, but also 
the sample is irradiated with much more light 
than is necessary for obtaining information. 

Ideguchi et al. are well aware of these prob- 
lems, and argue that advances in frequency- 
comb technology will lead to sources that have 
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much higher pulse-production rates, which 
should reduce the amount of dead time consid- 
erably. Indeed, frequency-comb sources with a 
rate of 10” pulses per second, ten times higher 
than the one used by the authors, are already 
available, and more-exotic sources with even 
higher rates are being developed. As always, 
to be useful for microscopy, such sources will 
need to be engineered to a level that does not 
require a laser expert to run them, and to be 
accessibly priced. Such engineering has already 
brought ultrafast laser technology to a variety 
of surprising applications, and so the advent of 
practical, hands-off frequency-comb sources is 
probably just a matter of time. = 
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STEM CELLS 


Reprogramming in situ 


Cellular reprogramming to a stem-cell state has now been achieved in tissues of 
genetically engineered mice. This work signals a future for regenerative medicine 
in which tissue fates might be manipulated in living organisms. SEE ARTICLE P.340 


ALEJANDRO DE LOS ANGELES 
& GEORGE Q. DALEY 


opmental hierarchy: they can self-renew 

indefinitely and form all tissues of the 
body. Embryonic stem (ES) cells, obtained 
from early mammalian embryos, are the 
quintessential pluripotent cell, but induced 
pluripotent stem (iPS) cells can be generated 
by simply introducing four specific transcrip- 
tion factors into cultured cells and allowing 
them to grow under the laboratory conditions 
used to maintain ES cells, a process termed 
direct reprogramming. Despite these remark- 
able in vitro manipulations, the capacity of the 
cellular milieu and tissue microenvironment to 
support reprogramming within living organ- 
isms has not been explored. On page 340 of 
this issue, Abad et al.’ present the first report of 
such in vivo cellular reprogramming*. 

A standard technique for assessing the pluri- 
potency of human stem cells is to test their 
capacity to form teratomas when injected 
under the skin of immune-deficient mice. 
Teratomas consist of disorganized tissues 
of all three embryonic germ layers, and are 
believed to derive from abnormally growing 
germ cells or residual embryonic tissues. Occa- 
sionally, however, teratomas display a remark- 
able degree of organization, containing whole 
organs or limbs — a phenomenon termed 
fetus in fetu. This suggests that conditions 
supporting significant features of embryonic 
development do exist within our tissues, 
although the developmental origins of this 
event remain obscure. 

Abad et al. generated mice carrying a drug- 
inducible ‘gene cassette’ of the four cell-repro- 
gramming factors. To induce the expression of 
these transcription factors, the authors added 
the drug doxycycline to the animals’ drinking 
water. Remarkably, after several weeks, tera- 
tomas appeared in various tissues, indicating 
that in situ reprogramming had occurred. This 
suggests that selective laboratory culture con- 
ditions are not essential for reprogramming to 
pluripotency, and that the in vivo environment 
can substitute for them. 

During early mammalian development, 
the fertilized egg (zygote) and its immediate 
daughter cells are totipotent. Totipotency is 
defined as the potential to differentiate not 


Povzenet stem cells sit atop the devel- 


*This article and the paper under discussion? were 
published online on 11 September 2013. 


only into all cell types of the growing embryo 
(pluripotency), but also into extraembryonic 
cells of the placenta, such as trophoblasts, 
which sustain the growth and development of 
the entire organism. 

Intriguingly, Abad et al. found that repro- 
grammed iPS cells isolated from the blood of 
the doxycycline-treated mice had totipotent- 
like features. In addition to generating cells 
from the three embryonic germ layers, the iPS 
cells could form trophoblast stem-like cells 
in the laboratory — evidence of these cells’ 
expanded potential to generate derivatives 
of the trophectoderm lineage. Moreover, the 
teratomas derived from the reprogrammed 
iPS cells from mice contained ‘trophoblast 
giant cells’ 

When the authors introduced the in vivo- 
derived iPS cells into preimplantation 
embryos, the cells efficiently contributed to 
the trophectoderm lineage, including the 
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placenta. Strikingly, these iPS cells also gener- 
ated embryo-like structures in the abdominal 
cavity of the reprogrammable mice (similar to 
fetus in fetu), suggesting that the iPS cells have 
the potential to self-organize into a complete 
organism. Such totipotent-like features are 
not seen in ES cells or in iPS cells that have 
been reprogrammed in vitro’. The present 
landmark findings, therefore, have implica- 
tions for our understanding of the relationship 
between pluripotency and totipotency, and for 
future attempts at mammalian regeneration 
in situ (Fig. 1). 

It is well established that the cell-culture 
milieu profoundly influences the properties 
of pluripotent stem-cell lines, including their 
developmental potential and epigenetic fea- 
tures (chemical changes to genomic DNA and 
its associated histone proteins that influence 
cell identity and developmental potential)’. 
The growth conditions used for maintain- 
ing in vivo-derived iPS cells are similar to 
those for mouse ES cells and iPS cells derived 
in vitro. So one might assume that all three cell 
types would manifest similar developmental 
potential. However, it seems that in vivo repro- 
gramming fixes iPS cells in a more pristine, 
totipotent-like state. Abad and co-workers’ 
gene-expression profiling reveals that in vivo- 
derived iPS cells share transcriptional features 
with morulas (early-stage embryos), but more 
research is needed to dissect the functional 


' A 


Embryonic potential 


ig 


Embryonic potential 


iPS cell 


iPS cell 
Embryonic and 
extraembryonic potential 


Figure 1 | Pluripotent stem cells and their potential. a, The preimplantation embryo, or blastocyst, 
consists of the inner cell mass that gives rise to the fetus. Its outer cells, the trophoblast, develop into the 
extraembryonic tissues of the placenta. Embryonic stem (ES) cells, derived from blastocysts, self-renew 
in culture and are pluripotent, giving rise to all cell types of the growing embryo. ES cells rarely exhibit 
totipotent-like features such as the potential to form extraembryonic tissues. b, Differentiated adult cells 
cultured in the laboratory can be reprogrammed by overexpression of reprogramming transcription 
factors. These in vitro-reprogrammed induced pluripotent stem (iPS) cells are also mainly pluripotent. 
c, Abad et al.' show that iPS cells generated by reprogramming of adult mouse cells in vivo maintain a 
totipotent-like developmental potential; the cells can contribute to extraembryonic tissues as well as 


embryonic tissues. 
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relevance of the morula-enriched gene set. 

Further investigation may also reveal 
the epigenetic mechanisms that underlie 
the expanded developmental potential of 
in vivo-derived iPS cells. Previous work** has 
suggested that DNA methylation is crucial 
to safeguard pluripotency against commit- 
ment to extraembryonic lineages. Although 
the widely used mouse ES cells and in vitro- 
reprogrammed iPS cells are functionally 
pluripotent, the fact that they undergo an 
in vitro-programmed lineage restriction raises 
provocative questions regarding the fidelity of 
cell-state transitions induced in cell culture, 
and about the accuracy of cellular models 
generated by differentiation or manipulation 
in the laboratory. 

Do human pluripotent stem cells have toti- 
potent-like potential? Human ES cells were 
initially thought’ to generate trophectoderm 
on treatment with the protein BMP4. However, 
subsequent work showed that BMP4-treated 
human ES cells generate a subpopulation of 
cells that resemble extraembryonic mesoderm 
and do not correspond to genuine placental 
trophoblasts®. It is now thought that, rather 
than corresponding to an early totipotent- 
like state, human ES and iPS cells represent a 
distinct ‘primed’ state of pluripotency corre- 
sponding to a later stage of embryonic devel- 
opment than that of ‘naive’ mouse pluripotent 
stem cells’. This fundamental distinction 
between mouse and human pluripotent stem 
cells may greatly influence the potential to 
produce extraembryonic lineages. We specu- 
late that generation of human pluripotent 
stem cells with similar features to mouse ES 
cells may improve access to extraembryonic 
lineages in vitro. Generation of real placental 
derivatives from human pluripotent stem cells 
would enable modelling of placenta-associated 
disorders. 

Abad and co-authors’ work represents a 
landmark for what could become a powerful 
strategy in regenerative medicine — tissue 
reprogramming in situ. A hallmark of limb 
regeneration in amphibians is the formation 
of a blastema, a mass of dedifferentiated pro- 
liferating cells that undergoes morphogenesis 
and redifferentiates to replace structures that 
have been lost by amputation. However, there 
is currently no mammalian counterpart to 
the amphibian blastema, although there is a 
growing interest in strategies to induce regen- 
erative responses in mammals, especially 
humans. In this regard, in vivo application of 
the latest transgene-free reprogramming tech- 
nologies, such as those using modified mes- 
senger RNA sequences’ ora recently reported 
reprogramming cocktail of small molecules”, 
may allow reprogramming in situ to proceed 
in a controlled manner. The growing parallels 
between reprogramming and regeneration 
should inspire the application of reprogram- 
ming technologies in living organisms for 
regenerative ends. m 
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Super-luminous 
supernovae on the rise 


New observations suggest that certain extremely bright supernovae are not the 
nuclear explosions of very massive stars. Instead, they may be ordinary- mass 
events lit up by a potent central fountain of magnetic energy. SEE LETTER P.346 


DANIEL KASEN 


Ithough every supernova is remark- 
A brilliant — at its peak, the average 

stellar explosion shines about a billion 
times brighter than the Sun — astronomers 
have recently discovered an astonishing class 
of super-luminous supernovae that outshine 
the ordinary ones by almost a hundredfold’”. 
These are very rare examples of extreme stel- 
lar death, and their progenitors are unclear, 
although it has been tempting to associate 
them with the most massive stars in the Uni- 
verse. On page 346 of this issue, Nicholl et al. 
present data that, for certain events, point toa 
different origin. 

The origin of ordinary supernovae has been 
agreed on for decades; the most common 
events occur when a moderately massive star 
(one of around 10-20 solar masses) has nearly 
exhausted its nuclear fuel. The stellar core, now 
filled with ash, cannot maintain the pressure 
to withstand its own gravity, and collapses to 
a dense, compact nugget — a neutron star — 
releasing enough energy in the process to blow 
away the outer layers ina supernova explosion. 

For extremely massive stars, however, a 
different, and much more energetic, out- 
come may be possible. A star initially larger 
than about 140 solar masses becomes so hot 
in its interior that pairs of electrons and anti- 
electrons are spontaneously produced from the 
thermal bath. The energy expended in making 
these particles depletes the pressure support, 
and the star becomes ‘pair unstable’. The core 
begins to fall inwards, but this time with its fuel 
tank still completely full. 

The outcome is, predictably, catastrophic. 
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As the core contracts and becomes com- 
pressed, burning accelerates exponentially, 
and nearly all the remaining fuel is consumed 
within seconds. That extreme energy release 
completely blows the star apart, expelling a 
massive cloud of highly radioactive debris. 
The radioactive glow of the expanding cloud 
can be visible from more than a billion light 
years away. 

The theory of these hyper-energetic nuclear 
explosions, called pair-instability superno- 
vae (pair-SNe), was proposed’ in the 1960s, 
but it was only a few years ago that astrono- 
mers found evidence of an actual event”. A 
remarkably luminous supernova, named 
SN 2007bi, resembled the theoretical predic- 
tions; in particular, its brightness gradually 
faded at a rate consistent with the half-life of 
cobalt-56, a radioisotope produced abundantly 
in pair-SNe. 

The discovery excited but confused theo- 
rists. Pair-SNe are expected to occur in pris- 
tine regions of pure hydrogen and helium 
gas. SN 2007bi was found in a galaxy mildly 
polluted by chemical elements heavier than 
hydrogen and helium — what astrono- 
mers call metals. Theory suggests that stars 
containing even small traces of metals will 
continuously shed material in winds, losing so 
much mass early in their lives that they avoid 
the pair instability. If SN 2007bi was indeed a 
pair-SN, our understanding of the formation 
and evolution of very massive stars needed to 
be reconsidered. 

As it turns out, there is a relatively simple 
test of whether a supernova is big enough to 
be a pair-SN. The more massive and opaque a 
debris cloud, the longer it takes light to diffuse 
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Figure 1 | The Crab nebula. At the centre of the Crab nebula — the remnant of a supernova that 
exploded nearly 1,000 years ago — a spinning, magnetized neutron star is slowly injecting energy into 

the surrounding gas cloud, lighting it up. A similar, but more extreme, physical process may explain the 
super-luminous supernovae observed by Nicholl and colleagues’. A neutron star spinning ten times faster 
than the one in the Crab nebula, and with magnetic fields 100 times stronger, would inject its spin energy 
much more rapidly, within a few months, and shine more than a million times more brightly. 


out of it. The radioactive glow of a giant pair-SN 
should therefore rise to its peak brightness unu- 
sually slowly, over a period of about a year®”. 
That is several times longer than the rise of 
an ordinary-mass supernova. Unfortunately, 
astronomers did not catch the rise of SN 2007bi; 
they discovered it just as it was peaking. 

But now Nicholl et al. have discovered two 
super-luminous supernovae that are dead 
ringers for SN 2007bi. This time the events 
were caught early, and the rise time to peak 
could be measured. The rise was relatively 
rapid, about two months, implying a moderate 
debris mass of only 10-20 solar masses. Their 
conclusion: these two new supernovae — and 
presumably SN 2007bi, by association — are 
not pair-SNe. 

What could they be? One existing idea*’, 
favoured by Nicholl and colleagues, is that the 
emission is powered not by radioactivity, but 
by the activity of a spinning, highly magnet- 
ized neutron star (a ‘magnetar’). In this picture, 
the progenitor star was not extraordinarily 
massive, but it was rotating rapidly, and on 
collapse formed a magnetar spinning nearly 
1,000 times per second. The kinetic energy 
stored in that dense, whirling flywheel would 
be enormous, with the strong magnetic fields 


providing a mechanism to steadily transport 
the spin energy to the surrounding debris 
cloud, lighting it up’®. This would be an extreme 
version of the emission seen from the remnants 
of some ancient supernovae (Fig. 1). Simplistic 
models of this process nicely explain the rise 
and fall of SN 2007bi and its doppelgangers™*. 

Hints of magnetar activity have been noted" 
in a few other supernovae that reach similar 
peak brightnesses to SN 2007bi, but fade more 
rapidly after peak, perhaps pointing to a unify- 
ing mechanism for a range of super-luminous 
events. But other mechanisms for produc- 
ing very bright supernovae are possible; for 
example, expanding supernova debris may 
encounter a dense shell of gas, and light up ina 
violent collision’*. Nicholl and colleagues’ data 
should be valuable in discriminating between 
different models. 

Meanwhile, the pair-SNe, after a brief fling 
with reality, seem to have crept back into the 
realm of theoretical conjecture. Having failed 
to find a convincing candidate in their survey, 
Nicholl et al. argue that these events must be 
rare in the nearby Universe, less than 1 for 
every 100,000 ordinary supernovae. But our 
best chance of finding one may be to look into 
the very distant, very early Universe. Back 
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then, stars were probably bigger, and mostly 
free of metals. Future telescopes should be 
able to see a long way there; maybe they will 
catch a glimpse of these largest of nuclear 
explosions. m 
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Killing from the inside 


Lysosomes are the main degradative compartment in cells, but they are also 
involved in cell-death pathways. Studies using existing drugs show that lysosomes 
are excellent pharmacological targets for selectively destroying cancer cells. 


PAUL SAFTIG & KONRAD SANDHOFF 


here have been numerous efforts to 
identify the Achilles heel of cancer and 
to find ways of killing tumour cells 
while leaving normal cells unaffected. The 
development of cancer chemotherapy started 
in the 1940s, and our increasing understanding 
of cancer biology has led to ever more precisely 
targeted therapies. Most of these strategies 
target the abnormal proliferative behaviour 
of cancer cells. Now, writing in Cancer Cell, 
Petersen et al.’ propose an alternative intra- 
cellular anticancer target — the lysosome’. 
For a long time, lysosomes were mislead- 
ingly regarded as the cell’s waste bin, but we 
now know that they are more akin to cellular 
stomachs. In the lysosome, macromolecules 
are degraded by hydrolase enzymes, including 
protein-degrading cathepsin enzymes, and the 
resulting components are released as nutrients 
into the cytoplasm. Importantly, lysosomes 
are involved in several cellular processes, 
such as membrane repair, pathogen defence, 
autophagy and signalling”. The lysosomes in 
cancer cells are more numerous, larger and 
have greater cathepsin activity than those in 
normal cells, and the release of cathepsins from 
cancer-cell lysosomes into the extracellular 
space can promote tumour progression® : 
Lysosomes are also involved in cell death 
— the release of certain cathepsins from the 
lysosome into the cytoplasm is thought to 
trigger death by apoptosis and apoptosis-like 
pathways’. This release occurs by a process 
known as lysosome membrane permeabiliza- 
tion (LMP), which possibly occurs following 


*This News & Views article was published online 
on 2 October 2013. 


certain changes to the composition of mem- 
brane lipids and major lysosomal membrane 
proteins’. LMP can be induced by various 
stimuli, including reactive oxygen species and 
endogenous apoptotic stimuli. However, cancer 
cells seem to overcome this threat of death by 
invoking the action of the protein Hsp70, which 
is expressed in many tumour types. Hsp70 


a Normal cell 


Cell 
membrane 


—|- Lysosome 


specifically binds to a negatively charged lipid 
called bis(monoacylglycero)phosphate (BMP), 
which is found in the membrane of vesicles in 
the lysosome lumen’. This binding activates 
acid sphingomyelinase (ASM), an enzyme 
that breaks down the lipid sphingomyelin’, 
which is a typical and important component of 
cell membranes. Interestingly, increased ASM 
activity seems to support lysosomal integrity. 
Prompted by this observation, Petersen et al. 
hypothesized that inhibiting ASM in cancer 
cells would increase lysosomal fragility, LMP 
and cell death (Fig. 1). 

It was already known that cationic amphi- 
philic drugs (CADs) — substances that are 
well established for the treatment of depres- 
sion, allergies and hypertension — act as ASM 
modulators. At the low pH of the lysosome, 
the drugs interfere with the electrostatic inter- 
action between ASM, which is cationic, and 
the anionic surface of BMP-rich intralyso- 
somal membranes*”. The displaced ASM is 
then rapidly degraded by cathepsins. Petersen 
and colleagues tested the effects of CAD treat- 
ment on several types of cancer cell, and found 
that the drugs killed the cells at much lower 
concentrations and shorter exposure times 
than was required for them to affect the viabil- 
ity of non-transformed cells. CAD treatment 
also led to reduced tumour growth in animal 
models. Furthermore, the authors found that 
cancer cells that were resistant to many other 
anticancer drugs were susceptible to CADs. 
Fascinatingly, this treatment restored the cells’ 
susceptibility to the other drugs. 
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Figure 1 | Lysosomes as a therapeutic cancer target. a, The degradation of macromolecules in 
lysosomes is achieved by hydrolase enzymes, including cathepsins. Another lysosomal enzyme is acid 
sphingomyelinase (ASM), which breaks down the membrane lipid sphingomyelin. ASM is positively 
charged and associates with another, negatively charged, lipid called BMP, which is found in the 


membranes of vesicles in the lysosome lumen. b, ASM activity is lower in cancer cells than in normal cells, 
and thus sphingomyelin levels are higher. Petersen and colleagues’ show that cationic amphiphilic drugs 
(CADs) selectively kill cancer cells. CADs are positively charged, so they can displace ASM from vesicular 
membranes such that it is degraded by cathepsins. It is possible that this blocks the residual ASM activity 
in cancer cells, leading to even higher levels of sphingomyelin, which may disturb membrane homeostasis 
and cause lysosome membrane permeabilization (LMP). This allows cathepsins to be released into the 
cytoplasm, triggering cell-death pathways. 
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The reason for the high susceptibility of 
cancer cells to ASM targeting is not completely 
understood. However, it might be explained 
by the fact that ASM activity is already low in 
cancer cells, and a further decrease could lead 
to a membrane-destabilizing level of sphingo- 
myelin in the lysosomes of these cells. Cancer 
cells have higher levels of membrane dynamics 
and cellular signalling than normal cells, and 
high concentrations of sphingomyelin might 
inhibit these processes. It is likely that other 
lipids that are ASM substrates also contribute 
to the fragility of lysosomal membranes in 
tumour cells. An additional aspect of the high 
sphingomyelin levels in cancer cells is that the 
export of cholesterol from lysosomes is inhib- 
ited’; this leads to the inactivation of saposins 
(essential cofactors for sphingolipid-degrading 
enzymes), and thereby a further reduction in 
sphingolipid degradation. 

Thus, it seems that CAD treatment may lead 
to a generally dysfunctional lysosomal-lipid 
homeostasis that severely affects the physiol- 
ogy of this cellular compartment and favours 
lysosome-related cell-death pathways. It would 
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be useful to analyse the specificity of CADs for 
tumour-cell targeting in genetically defined 
animal models, such as mice lacking or over- 
expressing ASM. Another aspect yet to be inves- 
tigated is the rate of uptake of CADs in tumour 
cells compared with healthy cells. Surprisingly, 
although the events following LMP are well 
understood, a long-standing question is how 
LMP is mediated at a molecular level. Further 
studies of the role of CADs may help to explain 
how alterations in the lipid or protein composi- 
tion of lysosomal membranes lead to transient 
and enzyme-specific lysosomal leakage. 
Despite these remaining questions, Petersen 
and colleagues’ findings argue for an in-depth 
pharmacological and epidemiological study 
of the effect of CAD treatment on cancer 
outcomes. CADs are relatively cheap drugs 
that have limited side effects, but their activ- 
ity in lysosomal killing pathways is probably 
not sufficient for effective cancer therapy 
when used alone, so combination treatments 
with other chemotherapeutic compounds 
might be advisable. Future research might 
also reveal specific and potent modulators of 


To bind or not to bind 


Finding a way to control how particles bind to cells could open up opportunities 
for biomedical research. The discovery of a method for directing the orientation 
of particle-cell interactions is therefore a cause for excitement. 


ANDREA J. O'CONNOR & FRANK CARUSO 


he interactions of small particles with 
cells are crucial in biology, such as when 
immune-system cells remove dirt and 
bacteria to stop an infection. Such interactions 
have many potential applications in promis- 
ing medical therapies, and have fuelled the 
growing field of nanomedicine. Writing in 
Advanced Materials, Gilbert et al.’ report a 
significant contribution to this field: the prepa- 
ration of tube-shaped particles that attach to 
cells in different ways depending on the tubes’ 
surface properties. This could lead to new ways 
to deliver drugs into target cells and to create 
constructs from cells. 

Small particles (nanometres to micrometres 
in size) can interact with cells in many differ- 
ent ways, depending on the type of cell, the 
local environment and, notably, the physico- 
chemical properties of the particles. Scien- 
tists and engineers are therefore working on 
approaches to tailor both the physical and the 
chemical properties of particles in order to 
develop control over their interactions with 
cells and tissues. 

The size, stiffness, shape and chemi- 
cal make-up of small particles all strongly 


influence how such particles interact with cells 
— they may bind to the outside of a cell, be 
taken up by the cell and trafficked through dif- 
ferent pathways within it, and ultimately even 
change aspects of the cell’s functions. Gilbert 
et al. have designed and made hollow, tubular, 
polymeric particles that can have non-uniform 
chemical properties, to see how this affected 
the particles’ interactions with cells. 

Gilbert et al. made the microtubes by assem- 
bling layers of oppositely charged polymers 
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lysosomal-sphingolipid metabolism that are 
even more effective than CADs at inducing the 
death of cancer cells through this pathway. m 


Paul Saftig is at the Biochemisches Institut, 
Christian Albrechts-Universitat Kiel, 24118 
Kiel, Germany. Konrad Sandhoff is at the 
Life and Medical Sciences Institute, Universitit 
Bonn, 53121 Bonn, Germany. 

e-mail: psaftig@biochem. uni-kiel.de 


1. Petersen, N. H. T. et a/. Cancer Cell 24, 379-393 
(2013). 
2. Saftig, P.& Klumperman, J. Nature Rev. Mol. Cell 
Biol. 10, 623-635 (2009). 
3. Mohamed, M. M. & Sloane, B. F. Nature Rev. Cancer 
6, 764-775 (2006). 
4. Boya, P. & Kroemer, G. Oncogene 27, 6434-6451 
(2008). 
5. Fehrenbacher, N. et al. Cancer Res. 68, 6623-6633 
(2008). 
. Kirkegaard, T. et al. Nature 463, 549-553 (2010). 
. Linke, T. et al. Biol. Chem. 382, 283-290 (2001). 
. Hurwitz, R., Ferlinz, K. & Sandhoff, K. Biol. Chem. 
375, 447-450 (1994). 
9. Kornhuber, J. et al. Cell Physiol. Biochem. 26, 9-20 
(2010). 
10.Abdul-Hammed, M. et al. J. Lipid Res. 51, 1747-1760 
(2010). 


OND 


— polyelectrolytes — in the pores of a special 
type of filtration membrane that has straight 
pores. They then chemically cross-linked 
the polyelectrolytes to stabilize them, and 
dissolved away the membrane. Benefits of 
this method are that it can be used to make 
millions of microtubes at a time, and that it 
could be scaled up using multiple mem- 
branes. The dimensions of the microtubes 
could also be easily changed by altering the 
pore size or the thickness of the templating 
membrane. 

The authors prepared microtubes that were 
either cell-resistant or cell-adhesive by chang- 
ing the polymers used to make them. Ina 
smart variation, they also made microtubes 
that were cell-resistant along their length on 
the outside but cell-adhesive inside. Because 
cells cannot fit inside the tubes, they can bind 
only to the ends of the microtubes where the 
adhesive molecules are exposed. The research- 
ers therefore observed that these particles tend 


Figure 1 | Surface properties control interactions of polymer tubes with cells. Gilbert et al.' have 
prepared polymeric, micrometre-scale tubes from cell-resistant (pink) and cell-adhesive (green) 
polymers. a, When incubated with mouse cells, the microtubes made from only the cell-resistant polymer 
did not bind to the cells. b, Microtubes that were cell-adhesive along the sides but cell-resistant on the 
ends bound to cells side-on. ¢, Microtubes that were cell-resistant along the sides, but cell-adhesive at the 


ends, bound to the cells end-on. 
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to bind to cells end-on — that is, the particles’ 
chemical properties control their orientation 
upon interaction with the cells. Conversely, 
the authors also made microtubes that have 
adhesive molecules along their outer length 
but not on the ends; these tend to bind to 
cells side-on. 

Much of our knowledge about particle 
interactions with cells originates from stud- 
ies of spherical particles that have uniform 
surface chemistry. However, it has become 
clear that particle shape has a major role in 
these interactions. For example, two particles 
made of the same polymer may have differ- 
ent rates of uptake by cells, solely because of 
differences in their shape and in their orien- 
tation relative to the cells’. Furthermore, dif- 
ferent surface patterns on particles can have 
distinctively different effects on the particles’ 
interactions with cells*. These effects are 
important for many potential medical applica- 
tions of nano- and microparticles, particularly 
drug delivery. 

Particles that have tunable orientations 
for cell attachment, like those developed by 
Gilbert et al., might provide another level of 
control over drug delivery into targeted cells. 
Cell-binding ‘patches’ on particles could regu- 
late how those particles bind to cells, and even 
which cells they bind to, if biochemical mol- 
ecules that target particular proteins or recep- 
tors are added to the patches. Furthermore, the 
other key properties of such particles, includ- 
ing their size, aspect ratio (the ratio of width 
to length) and stiffness, could be used to influ- 
ence rates of particle binding and uptake into 
the cells. 

It can also be envisaged that new biomater- 
ial—-cell constructs could be created by using 
such particles to connect cells into a network 
whose properties are highly tunable, for exam- 
ple by changing the numbers, dimensions and 
cell-binding tendencies of the particles. This 
could facilitate the formation of biomimetic 
or tissue-like hybrid materials that contain 
living cells in an environment more like their 
native milieu than the current commonly used 
in vitro supports. 

Furthermore, regulation of the cellular 
microenvironment can alter cell functions 
such as differentiation of stem cells, cell- 
growth rates and gene expression’. So, if cells 
bind particles in a selective orientation, the 
particles could potentially also shape how 
these functions are organized in three dimen- 
sions within a tissue construct. Such orienta- 
tion-specific interactions could be valuable for 
generating scaffold-free constructs for tissue 
engineering. In the future, it might also be 
possible to form cell-based ‘polymers; using 
microtubes with engineered regions to link 
cells to form different architectures, including 
linear and branched systems. 

Several obstacles need to be overcome to 
apply the microtubes therapeutically. These 
include the formation of microtubes that are 


biocompatible and which can respond to, and 
degrade in, cellular or physiological condi- 
tions. It is likely that Gilbert and colleagues’ 
work will trigger research into optimizing the 
efficiency with which such particles can be 
produced and attached to cells, and also into 
what happens to the particles and cells when 
they interact over extended times in vitro and 
in vivo. This will be well worth it, because the 
prospect of being able to design particles to 
deliver payloads of drugs and organize cells in 
new ways is exciting. = 
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An old drug plays 


a new trick 


A drug already used to treat Parkinson’s disease induces repair of the damage 
that occurs to the myelin sheath around nerve fibres during multiple sclerosis. 
The finding offers new therapeutic avenues for this disease. SEE ARTICLE P.327 


HARTMUT WEKERLE & EDGAR MEINL 


hat a change: just 20 years ago, 

multiple sclerosis was a disease 

without any promising treatment. 
Today, it has become treatable. Indeed, the 
number of drugs that effectively mitigate, 
although unfortunately do not cure, the dis- 
ease is impressive, and is growing. But these 
drugs work well only in blunting the early 
inflammatory phase of multiple sclerosis, they 
do not help to restore myelin' — the protec- 
tive sheath surrounding the axons of neurons 
in the brain and spinal cord that is damaged 
in the disease. In this issue, Deshmukh et al.” 
(page 327) identify a drug, benztropine, that 
may finally raise the hope of myelin repair”. 

The authors show that benztropine pro- 
motes the differentiation of oligodendrocytes 
(the myelin-forming cells in the brain) in vitro 
and supports remyelination in animal models 
of multiple sclerosis (MS). What’s more, the 
drug is an old acquaintance: benztropine is 
well established as an approved treatment for 
Parkinson's disease’. 

The finding came from a monumental 
experimental effort that was organized in three 
stages. First, in a high-throughput set-up, the 
researchers exposed immature oligodendro- 
cyte progenitor cells to 100,000 different small 
molecules in individual culture wells (Fig 1a). 
The progenitors came from the optic nerves of 
newborn rats, in which myelin formation was 
just about to start. After 6 days of exposure, 
the cultures were screened for production of 
intracellular myelin basic protein (MBP), a key 


*This article and the paper under discussion? were 
published online on 9 October 2013. 
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product of myelin-forming cells. This process 
identified compounds in more than a dozen 
functional classes that drove MBP forma- 
tion; these molecules were then considered 
further. 

MBP production is necessary but not suf- 
ficient for the formation of intact, compacted, 
myelin sheaths. To test the effects of their com- 
pounds on this latter process, the authors used 
a second, medium-throughput step to examine 
effects on the myelination of axons of neurons 
co-cultured with oligodendrocyte precursors 
(Fig. 1b). In the presence of one of the com- 
pounds, benztropine, more myelin wraps 
appeared around the axons than in untreated 
cultures. Additional pharmacological experi- 
ments revealed that the benztropine-driven 
myelination involved blocking the activity of 
muscarinic cholinergic receptors, extending 
a previous observation about the function of 
these receptors on oligodendrocytes’. 

These findings were made using cultures 
of rodent cells, but, as readers (and drug- 
regulatory agencies) would ask, could the drug 
promote myelin repair in live animals with an 
MS-like disease? In the third, low-throughput, 
stage (Fig. 1c), the investigators tested benztro- 
pine in two mouse models of demyelinating 
disease. The first, called experimental auto- 
immune encephalomyelitis, involves induc- 
ing an autoimmune response against myelin by 
immunizing the animals with a myelin auto- 
antigen. This condition recapitulates in vivo 
some essential features of human MS, such as 
large-scale demyelinated lesions accompanied 
by axonal damage’. Benztropine substantially 
mitigated ongoing disease and promoted 
remyelination in these mice. But these findings 
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Figure 1 | Identification of a myelin-promoting drug. Deshmukh et al.’ used a three-stage screening 
and validation protocol to search for drugs that might restore the damage to myelin that occurs during 
multiple sclerosis. a, In the high-throughput stage, the authors tested the effects of 100,000 compounds 
on cultured rat oligodendrocyte progenitor cells, searching for agents that would induce differentiation of 
oligodendrocytes, which produce myelin basic protein (MBP). One compound, benztropine, particularly 
enhanced MBP production. b, In the medium-throughput stage, the authors show that benztropine 
enhanced axon myelination when added to a co-culture of oligodendrocyte progenitors and neurons. 

c, The authors then tested the effect of benztropine in two animal models of demyelination — one 
immune-mediated and the other chemically induced. In both models, the drug promoted remyelination, 
and in the immune-mediated model, it alleviated disease symptoms such as hind-leg and tail paralysis. 


do not unequivocally prove remyelination to 
be the sole therapeutic mechanism. Although 
the drug did not grossly alter the function 
of the immune cells involved in MS (which 
include peripheral T lymphocytes and mac- 
rophages), an action on immune-effector 
mechanisms within the central nervous system 
cannot be excluded. 

In another remarkable result, the authors 
report that, when given prophylactically, 
benztropine enhanced the density of mature 
oligodendrocytes in the animals’ central nerv- 
ous systems even before disease onset. Is this 
effect welcome, or could an excess of oligoden- 
drocytes be detrimental? That is not yet clear, 
but proof of benztropine’s myelin-promoting 
potential came from a second animal model, 
in which myelin is destroyed not through an 
inflammatory process, but by an orally admin- 
istrated toxin called cuprizone’®. In these mice, 
spontaneous myelin repair following discon- 
tinuation of cuprizone treatment was clearly 
accelerated by benztropine therapy. 

In MS, although most lesions remain per- 
manently demyelinated, certain lesions called 
shadow plaques show evidence of spontaneous 


remyelination. But, as a rule, the renewed 
myelin formation remains incomplete, with 
axons wrapped only by solitary myelin seg- 
ments, or internodes, that are abnormally 
thin’. Furthermore, the degree of remyelina- 
tion does not directly correlate with disease 
severity’. This failed repair effort mostly 
seems to reflect a failure of oligodendrocyte 
progenitors to myelinate, rather than a lack of 
locally available progenitors’. Thus, one would 
conclude that drugs that support myelin 
reformation are a more promising treatment 
for repair than the supply of additional pro- 
genitors, for example by cell transplants. The 
investigators conclude that benztropine acts 
mainly by promoting the differentiation of 
oligodendrocyte precursors to mature myelin- 
producing cells, although their results do not 
rule out local progenitor proliferation as an 
alternative or additional mechanism. 

There have been many different approaches 
to seeking drugs for MS. Some success- 
ful drugs were discovered by serendipity. 
Interferon-f, for example, was initially applied 
with the intention to clear affected brains of 
what were assumed to be pathogenic viruses”. 
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Other drugs were found by directed screen- 
ing or design: natalizumab, for example, is an 
antibody designed to block the entry of inflam- 
matory cells to the brain. It was identified as 
effective in animal studies and from there 
moved directly into the clinic’. 

Deshmukh and colleagues opted for the 
latter approach, and their platform has been 
remarkably successful. But a few caveats 
remain. Benztropine was identified using 
rodent cells and models only, without a test 
stage using human cells. Will the drug’s anti- 
cholinergic activity induce the differentiation 
of human progenitor cells in the same way? In 
this context it is worth noting that not all anti- 
cholinergic agents tested by the authors were 
as effective at inducing MBP as benztropine. 
Furthermore, anti-cholinergic drugs are rou- 
tinely used to treat urinary complications in 
people with MS”, but changes in the general 
disease course of these patients have not been 
reported, to our knowledge. Finally, it remains 
to be seen how benztropine will affect remy- 
elination in people with MS, both at sites of 
active demyelination and in chronic, ‘burnt 
out’ lesions. 

Numerous therapeutic trials have tested a 
diversity of potential myelin-protective drugs, 
but so far with disappointing outcomes”. Will 
Deshmukh and colleagues’ study change this 
dire situation? Here, the authors keep a low 
profile. They cite the nature and severity of 
the side effects of benztropine that have been 
observed following its use in Parkinson's dis- 
ease. But the present results will motivate the 
group to search for drug variants that maintain 
benztropine’s virtues — its myelin-repairing 
activity and its ability to cross the blood-brain 
barrier — but lack its dark side. = 
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Criteria for the use of omics-based 


predictors in clinical 


trials 


Lisa M. McShane’, Margaret M. Cavenagh’, Tracy G. Lively', David A. Eberhard”, William L. Bigbee®, P. Mickey Williams’, 
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, Jeremy M. G. Taylor®, Deborah J. Shuman!, Richard M. Simon’, 


The US National Cancer Institute (NCI), in collaboration with scientists representing multiple areas of expertise relevant to 
‘“omics’-based test development, has developed a checklist of criteria that can be used to determine the readiness of 
omics-based tests for guiding patient care in clinical trials. The checklist criteria cover issues relating to specimens, assays, 
mathematical modelling, clinical trial design, and ethical, legal and regulatory aspects. Funding bodies and journals are 
encouraged to consider the checklist, which they may find useful for assessing study quality and evidence strength. The 
checklist will be used to evaluate proposals for NCI-sponsored clinical trials in which omics tests will be used to guide 


therapy. 


vide detailed characterization of diseases to more effectively 
predict a patient’s clinical course or to select the most beneficial 

therapies (see Box 1). These technologies have been embraced enthu- 
siastically in oncology, as the heterogeneous character of malignant dis- 
eases presents substantial challenges for cancer detection, prognosis and 
optimal selection of therapy. Many preclinical studies using these tech- 
nologies to elucidate biological features and mechanisms have been pub- 
lished, and retrospective studies applying omics assays to stored human 
biospecimens have been conducted to develop mathematical models to 
predict clinical endpoints such as survival or response to therapy. 

Despite numerous publications, however, few omics-based predictors 
have been translated successfully into clinically useful tests. A factor that 
contributes to the slow pace of clinical translation is the challenge of 
assessing whether the body of evidence for an omics-based test is suffi- 
ciently comprehensive and reliable that the test is ready for definitive 
evaluation in a clinical trial in which it could be used to direct patient 
care. Translation from research-grade omics assays to clinical-grade 
omics-based tests’ requires a rigorous development and validation pro- 
cess with attention to the complexities of omics assays and their applica- 
tion to clinical specimens, specialized expertise required to appropriately 
develop and evaluate mathematical predictor models built from high- 
dimensional data, and multiple ethical, legal and regulatory issues. 

Recently there have been some widely publicized cases of premature 
advancement of omics-based tests to use in trials in which they were 
used to guide patient treatment decisions. These cases led to calls for 
examination of the field of translational omics. The Institute of Medicine 
(IOM) conducted a study’ to review the field and formed the Committee 
on the Review of Omics-Based Tests for Predicting Patient Outcomes in 
Clinical Trials. The group’s task statement included recommending an 
evaluation process for determining when omics tests are fit for use in 
clinical trials and applying it to several specific cases of premature use of 
omics-based tests’. The resulting report laid out a three-phase process 
for the development and evaluation of omics-based tests for use in 
clinical trials: the discovery phase, the test validation phase, and the 
evaluation for clinical utility and use stage. 


H igh-throughput ‘omics’ technologies hold great promise to pro- 


During the IOM committee deliberations, the NCI convened a work- 
shop to bring together scientists and stakeholders who had an interest in 
this area of research to stimulate community dialogue. Subsequently, a 
working group was formed to develop a checklist that would operatio- 
nalize the principles set forth in the IOM report and the NCI workshop 
discussions. 

The results of those efforts are presented in Table 1, which lists 
30 criteria that should be addressed to determine the readiness of an omics 
test for use in a prospective clinical trial. These criteria apply to any 
clinical trial involving the investigational use of an omics test that will 
influence the clinical management of patients in the trial; for example, 
the selection of therapy. These criteria cover not only the strength of 
evidence in support of an omics test but also the practical issues that 
must be considered before the test is used in a clinical setting. The 
criteria can also be helpful in assessing the reliability and credibility of 
an omics predictor to justify its use on valuable non-renewable archived 
specimens collected from patients who were prospectively enrolled in 
previous clinical studies. This paper presents the criteria in checklist 
form with brief background. Readers are referred to a recently published 
companion paper’ for a more complete explanation and elaboration of 
the rationale for each criterion. 


BOX | 
Definition of ‘omics’ 


In its report, Evolution of Translational Omics: Lessons Learned and the 
Path Forward, the Institute of Medicine Committee on the Review of 
Omics-Based Tests for Predicting Patient Outcomes in Clinical Trials 
defines ‘omics’ as the study of related sets of biological molecules in a 
comprehensive fashion. Examples of omics disciplines include 
genomics, transcriptomics, proteomics, metabolomics and 
epigenomics. An omics-based test is defined as “an assay composed 
of or derived from multiple molecular measurements and interpreted 
by a fully specified computational model to produce a clinically 
actionable result”’?. 
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Table 1 | Criteria for the use of omics-based predictors in NCl-supported clinical trials 


Domain 


Criteria 


Specimen issues 


Assay issues 


Model development, specification, and preliminary 
performance evaluation 


Clinical trial design 


Ethical, legal and regulatory issues 


1 


. Establish methods for specimen collection and processing and appropriate storage conditions to ensure the 


suitability of specimens for use with the omics test. 


2. Establish criteria for screening out inadequate or poor-quality specimens or analytes isolated from those 


specimens before performing assays. 


3. Specify the minimum amount of specimen required. 
4. Determine the feasibility of obtaining specimens that will yield the quantity and quality of isolated cells or 


analytes needed for successful assay performance in clinical settings. 


5. Review all available information about the standard operating procedures (SOPs) used by the laboratories 


that performed the omics assays in the developmental studies, including information on technical protocol, 
reagents, analytical platform, assay scoring, and reporting method, to evaluate the comparability of the 
current assay to earlier versions and to establish the point at which all aspects of the omics test were 
definitively locked down for final validation. 


6. Establish a detailed SOP to conduct the assay, including technical protocol, instrumentation, reagents, 


scoring and reporting methods, calibrators and analytical standards, and controls. 


7. Establish acceptability criteria for the quality of assay batches and for results from individual specimens. 
8. Validate assay performance by using established analytical metrics such as accuracy, precision, coefficient of 


variation, sensitivity, specificity, linear range, limit of detection, and limit of quantification, as applicable. 


9. Establish acceptable reproducibility among technicians and participating laboratories and develop a quality 


assurance plan to ensure adherence to a detailed SOP and maintain reproducibility of test results during the 
clinical trial. 


0. Establish a turnaround time for test results that is within acceptable limits for use in real-time clinical settings. 


1. Evaluate data used in developing and validating the predictor model to check for accuracy, completeness, 
and outliers. Perform retrospective verification of the data quality if necessary. 


2. Assess the developmental data sets for technical artefacts (for example, effects of assay batch, specimen 


handling, assay instrument or platform, reagent, or operator), focusing particular attention on whether any 
artefacts could potentially influence the observed association between the omics profiles and clinical 
outcomes. 


3. Evaluate the appropriateness of the statistical methods used to build the predictor model and to assess its 


performance. 


4. Establish that the predictor algorithm, including all data pre-processing steps, cutpoints applied to 


continuous variables (if any), and methods for assigning confidence measures for predictions, are 
completely locked down (that is, fully specified) and identical to prior versions for which performance 
claims were made. 


5. Document sources of variation that affect the reproducibility of the final predictions, and provide an 


estimate of the overall variability along with verification that the prediction algorithm can be applied to one 
case at a time. 


6. Summarize the expected distribution of predictions in the patient population to which the predictor will be 


applied, including the distribution of any confidence metrics associated with the predictions. 


7. Review any studies reporting evaluations of the predictor’s performance to determine their relevance for 


the setting in which the predictor is being proposed for clinical use. 


8. Evaluate whether clinical validations of the predictor were analytically and statistically rigorous and 


unequivocally blinded. 


9. Search public sources, including literature and citation databases, journal correspondence, and retraction 


notices, to determine whether any questions have been raised about the data or methods used to develop 
the predictor or assess its performance, and ensure that all questions have been adequately addressed. 


20. Provide a clear statement of the target patient population and intended clinical use of the predictor and 


2 


ensure that the expected clinical benefit is sufficiently large to support its clinical utility. 


1. Determine whether the clinical utility of the omics test can be evaluated by using stored specimens from a 


completed clinical trial (that is, a prospective-retrospective study). 


22. If anew prospective clinical trial will be required, evaluate which aspects of the proposed predictor have 


undergone sufficiently rigorous validation to allow treatment decisions to be influenced by predictor 
results; where treatment assignments are randomized, provide justification for equipoise. 


23. Develop a clinical trial protocol that contains clearly stated objectives and methods and an analysis plan 


that includes justification of sample size; lock down and fully document all aspects of the omics test and 
establish analytical validation of the predictor. 


24. Establish a secure clinical database so that links among clinical data, omics data, and predictor results 


remain appropriately blinded, under the control of the study statistician. 


25. Include in the protocol the names of the primary individuals who are responsible for each aspect of the study. 
26. Establish communication with the individuals, offices, and agencies that will oversee the ethical, legal, and 


regulatory issues that are relevant to the conduct of the trial. 


27. Ensure that the informed consent documents to be signed by study participants accurately describe the 


risks and potential benefits associated with use of the omics test and include provisions for banking of 
specimens, particularly to allow for ‘bridging studies’ to validate new or improved assays. 


28. Address any intellectual property issues regarding the use of the specimens, biomarkers, assays, and 


computer software used for calculation of the predictor. 


29. Ensure that the omics test is performed in a Clinical Laboratory Improvement Amendments-certified 


laboratory if the results will be used to determine treatment or will be reported to the patient or the patient’s 
physician at any time, even after the trial has ended or the patient is no longer participating in the study. 


30. Ensure that appropriate regulatory approvals have been obtained for investigational use of the omics test. If 


a prospective trial is planned in which the test will guide treatment, consider a pre-submission consultation 
with the US Food and Drug Administration. 


Specimen issues 


used in developmental studies were collected and handled to assess 


Molecular profiles generated by the use of omics technologies can be _ the robustness of an omics test to various specimen conditions. It may 
sensitive to specimen collection, processing and storage conditions*’. be necessary to conduct additional feasibility studies to document that 
Investigators should consider the conditions under which specimens _ the omics test will perform satisfactorily under the range of conditions in 
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which the specimens will be obtained and stored in typical clinical set- 
tings; alternatively, more restrictive requirements for specimen collec- 
tion, processing and storage should be clearly specified before the test is 
used in a clinical trial or other clinical validation study. 

Criteria for specimen quality, amount (mass or volume), and com- 
position should be clearly specified in order to qualify a specimen or its 
isolated analytes as suitable for assay by the omics test. Appropriate 
criteria will depend on the specimen type and the particular omics assay 
platform to be used. Details of the specification might include per cent 
purity of the target cells or intact analyte of interest and specific mass or 
volume of the specimen or analytes isolated from the specimen. It should 
be established that it is feasible to achieve these criteria in clinical settings. 


Assay issues 


Variations in assay procedures due to differences in technical protocols, 
reagents, and scoring and reporting methods can have a substantial 
impact on the analytical performance of an omics assay and its compar- 
ability among laboratories**. Many omics tests are developed using data 
from retrospective studies in which these aspects of the assay were not 
standardized. This can lead to uncertainties in how the test will perform 
when based on assay data from a new laboratory, including the laboratory 
or laboratories that will generate the assay data for a prospective trial. It is 
important to develop detailed standard operating procedures (SOPs) for 
the assay underlying the omics test and to establish that studies conducted 
previously to clinically validate the omics test were based on data expected 
to be comparable to new data generated under the specified SOPs. 

Analytical performance of the omics assay under the proposed SOPs 
must be documented and found to be acceptable in terms of metrics such 
as accuracy, precision, coefficient of variation, sensitivity, specificity, 
linear range, limit of detection, and limit of quantification, as applicable. 
Calibrators, analytical standards, and controls are essential components 
of the SOPs and should be described clearly. Quality assurance proce- 
dures should include criteria for acceptance or rejection of assay batches 
and results from individual specimens. When multiple technicians or 
laboratories will conduct the assays, monitoring procedures should be in 
place to ensure comparability across technicians and laboratories. Methods 
for assay scoring and reporting should be clearly specified. Turnaround 
times for return of test results should be within acceptable limits that will 
be dependent on the particular clinical situation and should be suffi- 
ciently rapid to not impede clinical management timelines. Feasibility 
studies to assess assay analytical performance, reproducibility and turn- 
around times may be required in advance of initiating a clinical trial to 
firmly establish the suitability of the omics test for use in a real-time 
clinical setting. 


Model development and evaluation 


Many omics tests are developed using existing omics, clinical and patho- 
logy data or using data generated from retrospective specimen collections. 
These data may be incomplete or unreliable and should be examined for 
errors, inconsistencies or bias. Omics assays can be sensitive to a variety of 
ancillary technical influences that result in artefacts in the generated data. 
Of particular concern is the potential for such artefacts to be confounded 
with clinical variables or endpoints. Efforts should be made to identify 
potential confounders, including source of specimens (for example, clini- 
cal sites processing specimens differently), laboratory performing the omics 
assay, and assay batches®. 

Examples of flawed applications of statistical approaches for develop- 
ment of omics predictors and for evaluation of their performance are 
abundant in the literature”°. Model overfitting, which occurs when a 
statistical model describes random noise instead of capturing the true 
association between predictor variables and a clinical endpoint, is a com- 
mon problem in omics research projects, in which the number of analytes 
measured per specimen exceeds the number of specimens studied. 
Overfitting can be reduced by the use of model ‘regularization’ approaches 
that constrain the complexity of the model, but these approaches do not 
completely eliminate overfitting risk. It is common for researchers without 
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the appropriate expertise to misunderstand and misapply modelling tech- 
niques. In addition, if flawed methods for model performance assessment 
are used, then overfitting may escape detection. A common mistake is 
failure to maintain strict separation between data used to build a model 
(‘training set’) and data used to assess model performance (‘testing set’). 
Numerous published papers have inappropriately reported model per- 
formance estimates based on resubstitution of data used to build a model 
back into that same model. These so-called ‘resubstitution estimates’ are 
severely (optimistically) biased. Assessment of model performance on the 
combined training and testing data sets is similarly problematic. Re-use of 
training data is acceptable only if performed properly using data resam- 
pling methods” that iteratively split the training data to hold out subsets of 
the data that are not used for model building and can therefore be used to 
check model performance. 

Development of an omics predictor can be an iterative process invol- 
ving several adjustments to improve performance. With regard to the 
three phases of the development and evaluation process in the IOM 
report’ on omics tests, it is noted in the report that preliminary valida- 
tions may occur in the test validation phase, and the definitive evalu- 
ation of clinical utility takes place in the final phase. It is important to be 
able to discern the point at which the omics test is ‘locked down’, or 
finalized, in all aspects, including specimen requirements, technical 
protocol for assay, data preprocessing, the form of mathematical pre- 
dictor model, and interpretation of the test result. The test is then ready to 
enter the final evaluation for clinical utility and use stage, at which there 
are three basic options for clinical utility evaluation: first, a prospective 
evaluation of the omics test on a retrospective specimen collection from a 
clinical trial or prospective cohort study; second, a prospective clinical trial 
in which the test does not direct patient management; and third, a prospec- 
tive clinical trial in which the omics test is used to direct patient manage- 
ment. Ideally, there should have been a blinded and rigorous preliminary 
validation of performance of the locked-down model on an external inde- 
pendent specimen set during the test validation phase. If an independent 
external validation set is not possible because adequate specimen collections 
do not exist, then existing performance evaluations based on internal vali- 
dations should be carefully reviewed to ensure that they were rigorous and 
used appropriate methods. In this situation, it may be necessary to use a 
clinical trial design that does not allow the test to influence patient care. 

When further adjustments are made to the omics test or data after the 
final validation data have been unblinded, there is a risk of comprom- 
ising the validation. If the omics test is adjusted, either a new validation 
must be performed or additional evidence must be obtained; for example, 
by conducting an assay-bridging study to ensure that the adjustments to 
the test have not adversely affected its performance. 

Investigators should be prepared to supply data and computer code as 
part of the review process for proposals to use omics tests in clinical 
trials. It is highly recommended that investigators follow reproducible 
research practices so that they will be able to supply the needed informa- 
tion quickly and easily for verification of the validation of the test and its 
locked-down form. Readers are referred to the companion publication” 
for further discussion of recommended reproducible research practices. 


Clinical trial design 

A clinical trial for definitive evaluation of an omics test should be con- 
ducted using the same rigorous standards expected for clinical trials evalu- 
ating experimental therapies. In some circumstances, high-level evidence 
can be obtained by use of specimens from an already-completed clinical 
trial'’. Accepted standards for good clinical practice must be followed'*”’, 
including development of a formal protocol with clearly stated objectives 
and eligibility criteria, an informatics plan for management of clinical 
and omics data, a pre-specified study design“ and statistical analysis plan, 
complete specification of the omics test, and justification for equipoise for 
any treatment randomizations (if the trial is conducted prospectively). 
The study team must include individuals with appropriate expertise to 
assume responsibility for the clinical, laboratory, pathology, bioinfor- 
matics, data management and statistical aspects of the study. 
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Ethical, legal and regulatory issues 


Numerous ethical, legal and regulatory issues must be addressed in the 
course of developing an omics test for clinical use. Research involving 
human subjects, which includes retrospective use of specimens from 
living subjects, requires that adequate protection is in place to ensure 
the safety of patients and the privacy and confidentiality of patient 
information’’. Ensuring appropriate protections has become more chal- 
lenging as omics technologies make it possible to provide detailed gene- 
tic characterizations of individuals and much research data are made 
publicly available. Informed consent documents for a clinical trial using 
an omics test to guide patient management must accurately describe any 
potential risks from participation in a study and all potential conflicts of 
interest on the part of study investigators or sponsoring institutions. 
Laboratory tests must be conducted in environments that meet Clinical 
Laboratory Improvement Amendments certification requirements if the 
results will be reported to the patient or the patient’s physician’®. 
Responsible parties at participating institutions (for example, institutional 
review boards, protocol review committees), trial sponsors (for example, 
the NCI, universities, companies), and the US Food and Drug Administra- 
tion (FDA) (for example, for Investigational Device Exemption (IDE)"” or 
Investigational New Drug"’* applications) must be fully informed of study 
details and approve the study before it proceeds. 

If the omics assay to be used in a clinical trial could be considered a 
significant-risk assay, including—but not limited to—one used to choose 
among treatments, investigators must consult with the FDA to determine 
whether an IDE from the Center for Devices and Radiologic Health, or a 
similar evaluation carried out through the Investigational New Drug process, 
is required. The complexities of omics-based tests, together with the FDA’s 
evolving view of regulatory enforcement discretion for these tests, make it 
important to have early communications with the FDA. Investigators may 
find it helpful to discuss the trial formally with the FDA in a pre-submission 
process if they are not familiar with IDE requirements”. 

Intellectual property issues may apply to the use of the specimens, 
biomarkers, assays, and computer software used for calculation of the 
predictor. Intellectual property rights should be documented and 
respected by all parties involved. Potential conflicts of interest of study 
investigators must be disclosed and managed. 


Summary 

Evaluation of the readiness of an omics test to be used for clinical care 
requires careful consideration of the body of evidence supporting the 
test’s analytical and clinical validity and potential clinical utility, as well 
as an understanding of ethical, legal and regulatory issues. Funding 
bodies and journals are encouraged to consider using the checklist as 
an evaluation guide in their review processes. The NCI plans to use the 
checklist presented here to evaluate proposals for the use of omics tests in 
clinical trials where the test will be used to guide patient care. Although it 
is not expected that exploratory studies using omics assays or studies 
aiming to develop omics tests will meet all of the checklist criteria, the 
checklist does provide a convenient framework by which to assess the 
stage of development of an omics test and the strength and quality of 
the accumulated evidence. Several of the checklist criteria (those that are 
not specific to the development of models from high-dimensional data) 
also apply to studies of single biomarkers, or limited panels of biomarkers, 
measured by a variety of conventional assay methods. The checklist may, 
therefore, serve as a useful reference in a variety of review settings. 

It is hoped that this 30-point checklist will guide investigators towards 
the use of best practices in omics test development, help them to more 
reliably evaluate the quality of evidence in support of omics tests, and 
assist them in planning appropriately for the clinical use of omics pre- 
dictors. The ultimate goal is to develop a more efficient, reliable and 
transparent process to move omics assays from promising research 
results to clinically useful tests that improve patient care and outcome. 
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Olivine crystals align during diffusion 
creep of Earth’s upper mantle 


Tomonori Miyazaki’, Kenta Sueyoshi! & Takehiko Hiraga’ 


The crystallographic preferred orientation (CPO) of olivine produced during dislocation creep is considered to be the 
primary cause of elastic anisotropy in Earth’s upper mantle and is often used to determine the direction of mantle flow. A 
fundamental question remains, however, as to whether the alignment of olivine crystals is uniquely produced by dis- 
location creep. Here we report the development of CPO in iron-free olivine (that is, forsterite) during diffusion creep; 
the intensity and pattern of CPO depend on temperature and the presence of melt, which control the appearance of 
crystallographic planes on grain boundaries. Grain boundary sliding on these crystallography-controlled boundaries 
accommodated by diffusion contributes to grain rotation, resulting in a CPO. We show that strong radial anisotropy is 
anticipated at temperatures corresponding to depths where melting initiates to depths where strongly anisotropic and 
low seismic velocities are detected. Conversely, weak anisotropy is anticipated at temperatures corresponding to depths 
where almost isotropic mantle is found. We propose diffusion creep to be the primary means of mantle flow. 


Observations of anisotropy in the velocity of seismic waves have given 
us a dynamic view of Earth’s interior’. Interpretation of seismic aniso- 
tropy in the mantle is based on our understanding of mineral physics 
and the anisotropic deformation characteristics of the olivine lattice’. 
Dislocation creep, which is considered to be one of the main deforma- 
tion mechanism in Earth’s interior, involves not only slip on specific 
crystallographic planes, but also grain rotation. As a consequence, the 
primary crystal slip plane and axis align with the direction of mantle 
flow’. All minerals are elastically anisotropic to some extent, such that 
crystallographic alignment of minerals with a specific direction results 
in anisotropy in elastic-wave velocity. Consequently, observed seismic 
anisotropy is often interpreted as the result of slip on a particular slip 
system. The recent discovery of the various patterns of CPO of olivine 
formed under various experimental conditions such as different stresses, 
pressures and water concentrations has resulted in new interpretations 
of seismic anisotropy and CPO observed in nature*®. The common 
principle underlying these interpretations is that the anisotropy is a con- 
sequence of dislocation creep. Here we describe experimental evidence 
showing that significant olivine crystallographic alignment occurs dur- 
ing diffusion creep, which may alter the simple, long-held view of CPO 
development in Earth’s interior. 


Creep tests and microstructural analyses 


We conducted uniaxial tensile and compression creep experiments on 
fine-grained iron-free olivine (that is, forsterite) plus 20 vol.% diopside 
(a combination that we denote Fo+ Di) and the same compression tests 
on forsterite plus 20 vol.% anorthitic melt (An) at atmospheric pres- 
sure, temperatures of 1,150-1,385 °C (the solidus temperature, T,, for 
Fo+Di is 1,382 °C) and strain rates of é = 10 °-10 *s 1}. We antici- 
pated that tension experiments would give us an indication of the easy 
slip direction and that compression experiments would indicate the 
easy slip plane for forsterite if we detected a CPO in our samples. 

In all the tests, a piece of the same starting material was placed next 
to the creep sample but not under load, to observe changes in the micro- 
structure due to static annealing (we refer to this sample as a reference 
sample). We rapidly changed displacement rates (v) during the com- 
pression experiments, to obtain the stress dependence of the strain rate. 


Unless melt was involved, we also conducted independent compres- 
sion experiments at lower but constant values of v and to a constant 
final strain (¢ = 0.6). These constant-v experiments were conducted to 
eliminate any effect of changing strain rate and strain on the sample 
microstructure, especially CPO. Typical stress (7)-e curves showed 
strain hardening, which is attributed to grain growth (Extended Data 
Fig. 1). Such growth is identified in the change in grain size (d) mea- 
sured before and after the creep experiments. Grain growth during the 
experiments was estimated using static and dynamic grain growth laws’* 
(Methods). Rheological data were analysed on the basis of the power-law 
relationship = A(o"/d?), where A is a constant and n and p are respec- 
tively the stress and grain size exponents. We estimate that p = 1-2 from 
the relationship between sample hardening and the estimated grain 
growth during the deformation. By using p = 2 to normalize the grain 
size to 1 um in all samples’, including those from individual experiments 
conducted for CPO analyses, we find a linear dependence between strain 
rate and stress with n = 1.1 + 0.2 for temperatures from 1,200 to 1,350°C 
(Fig. 1). 

In Fig. 2, the crystallographic orientations of more than 300 for- 
sterite grains, taken from the centre part of each sample, are plotted on 
lower-hemispheric projections, where the x (or y) and zaxes respec- 
tively correspond to directions perpendicular and parallel to the direc- 
tion of applied force. Because we conducted uniaxial tests, the x and 
y axes are equivalent. We found a systematic change in CPO patterns 
with temperature. At T= 1,250°C in Fo+Di samples deformed in 
compression, the forsterite caxes ([001]) are weakly oriented in the 
z direction and the a axes ([100]) are weakly but homogeneously oriented 
in the x-y plane (we denote this as an a girdle), whereas in samples 
deformed in tension, the a axes are weakly oriented in the z direction. 
CPOs developed in this temperature range are very weak (J index <3; 
see Extended Data Table 1, where the J index is a measure of the density 
distribution of the crystallographic orientations, that is, J is respectively 
unity and infinity for a random distribution and for a single crystal 
(perfect CPO)’°); nevertheless, the observed CPO patterns are characte- 
ristic of all the samples deformed under these conditions. At 1,300- 
1,350 °C, the b axes ([010]) are strongly oriented in the z direction and a 
and c girdles form in the x-y plane in samples deformed in compression, 
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Figure 1 | Strain rate asa function of stress. Data from Fo+ Di samples at five 
different temperatures and from the Fo+ An sample at 1,260 °C are presented. 
Open symbols represent the data from samples that were used to determine 
the crystallographic orientation of forsterite grains shown in Fig. 2. Three 
different slopes corresponding to three different stress exponents (n = 1-3) 
are shown at lower right. 
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Figure 2 | Lower-hemispheric projections of the crystallographic 
orientations of forsterite grains. The Fo+ Di orientations are at a strain of 
é = 0.6 (¢ = 1.0 for the partly molten Fo+ Di) and are deformed at constant 
displacement rate (v). The Fo+ An orientations are at ¢ = 0.4 and are deformed 
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whereas in samples deformed in tension, the a axes are oriented towards 
the z direction and b girdles form in the x-y plane. The CPO formed in 
samples deformed at 1,250 °C has characteristics of the patterns observed 
for samples deformed at T < 1,250 °C and T= 1,300 °C. At 1,385 °C, 
where melting of all diopside grains and some of the forsterite produces 
samples of Fo plus 30 vol.% melt, we observe very weak orientation of 
baxes in the z direction, which was also identified in Fo+ An samples. 
Most of the CPOs are weak, but their intensity increases with strain, 
as confirmed in a sample deformed at 1,330 °C (Extended Data Fig. 2). 
CPO patterns change systematically with temperature and the presence 
of melt. To reflect this, in what follows we refer to subsolidus temperatures 
of T = 1,250 °C as ‘low T’, subsolidus temperatures between 1,250 °C 
and T, as ‘high T’, and temperatures at which the samples are melt 
bearing as ‘>T,’. 

The CPO patterns were identified from samples deformed at the 
lowest stresses for each temperature where linear o-é relationships 
were observed under all experimental conditions (Fig. 1). Thus, these 
samples were necessarily deformed by a diffusional creep mechanism, 
and it is difficult to explain the CPO patterns as a result of dislocation 
creep on specific slip systems in forsterite. In the reference samples, 
forsterite grains at low T have an equigranular (that is, isotropic) shape, 
whereas grains at high T develop a weakly to strongly elongated (tabular) 
shape (Fig. 3). Grains in >T, samples have a polygonal-equigranular 
shape. These characteristic shapes align towards the direction of maxi- 
mum strain in the deformed samples. In Fig. 4, we plot the Jindex as a 
function of aspect ratio for the reference samples determined from the 
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at constant force (F). N, number of measured grains. Measurements were made 
by scanning electron microscopy (SEM) and SEM electron backscatter 
diffraction (EBSD). pf], sharpness of a pole figure’. 
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Figure 3 | SEM backscatter images of reference and deformed samples of 
Fo+ Di, partly molten Fo+Di and Fo+An. Dark-grey grains are forsterite 
and light-grey grains are diopside. Diopside grains are weakly aggregated and 
aligned in the direction of compression, whereas grains of forsterite and 


CPOs shown in Fig. 2. At low T, the grains have a constant aspect ratio 
of 1.3-1.4, whereas at high T the aspect ratio increases with tempera- 
ture. The aspect ratio is smaller in melt-bearing samples. The aspect 
ratios in reference samples and J index values from high-T deforma- 
tion experiments correlate positively, indicating that anisotropic grain 
growth with temperature controls the CPO. 

In addition to grains with highly anisotropic shape appearing even 
in the reference samples, grain boundaries are frequently straight and 
parallel on opposite sides of the grains. We found that the long axes 
of forsterite grains in the high-T reference and deformed samples are 
frequently perpendicular to [010] and occasionally to [001], and that 
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Figure 4 | Jindex determined from EBSD analyses of the deformed samples 
as a function of aspect ratio of forsterite grains in the reference sample. Data 
come from both tension experiments (open symbols) and compression 
experiments (filled symbols). A different effect of the aspect ratio on J index is 
observed in compression and tension samples owing to the strong alignment of 
baxes in the z direction in high-T compression samples (Fig. 2). 
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diopside phases are weakly to strongly elongated and aligned in the tensile 
direction. All the reference samples were annealed during the tensile 
experiments. In the melt-bearing samples (KF-209 and KF-186), the melt phase 
is light grey or dark grey and the forsterite grains are darker still. 


they are parallel to [100] (Extended Data Fig. 3). Consequently, the 
longest straight grain boundaries parallel to the long axes of forsterite 
grains in the high-T samples also lie in the (010) plane (the plane per- 
pendicular to the b axis) (Fig. 3), a conclusion that is supported by our 
transmission electron microscopy study (Extended Data Fig. 4). This 
type of the boundary is referred to as a low-index plane boundary"! and 
has been previously identified in iron-bearing olivine aggregates'*”’. 
Forsterite grains at low T and >T;, have aspect ratios low enough that it 
was difficult to analyse their shape with respect to their crystallographic 
axes. However, especially in > T, samples, forsterite has straight interfaces 
indicative of the crystal habit. Thus, we conclude that the development 
of low-index plane boundaries controls the grain shape and, further- 
more, the development of CPO during deformation at all temperature 
conditions. 


Condition for CPO formation 

We observe aggregation of like phases in the direction of compression 
in melt-free samples (Fig. 3). This aggregation is due to significant grain 
boundary sliding (GBS), which allows isolated grains of the secondary 
phase to contact each other through grain switching events’. In addi- 
tion, Fo+Disamples at high T exhibited superplasticity (that is, a tensile 
strain of >>100%; Extended Data Fig. 5), which also requires a signifi- 
cant contribution from GBS’». Thus, it is reasonable to consider CPO 
development in a regime where GBS is accommodated by diffusion. 
Because the orientation of many grain boundaries is crystallography 
controlled, GBS should frequently occur on specific crystallographic 
planes and even in specific directions. Preferential GBS on specific boun- 
daries contributes to grain rotation followed by an alignment of easy 
slip grain boundaries in the direction of flow’® (Extended Data Fig. 6). 
As a result, we can identify an apparent easy slip plane and direction 
for GBS from the CPO patterns, which can be compared with results 
on dislocation creep showing that slip occurs along preferential inter- 
granular crystallographic planes. 
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Our proposal that CPO is controlled by grain shape is strongly sup- 
ported by the characteristic high-T CPO pattern formed during low-T 
compression experiments, where the starting material was prepared at 
high T (Extended Data Fig. 7). Little grain shape modification is expected 
after the saturation of grain size at high T (Extended Data Table 1), and, 
consequently, the CPO pattern must have developed as a result of the 
inherited high-T grain shape. Essentially the same observations and con- 
clusions have been made for CPO formation in deformed alumina aggre- 
gates, which yielded crystallographically controlled anisotropic alumina 
grains in a certain sintering temperature range and deformed by means 
of GBS plus grain rotation without the help of dislocation glide’. In sili- 
cates, the development of CPO during deformation with a stress expo- 
nent of ~ 1 has been identified previously'*” and was thought to be due 
toa contribution from dislocation glide or chemical reaction, neither of 
which is required to explain CPO development in our experiments. 

Assuming that the observed CPO patterns reflect the elongated for- 
sterite shape, we estimate the relative axial dimensions of forsterite as 
a> b> catlowT, indicating that the plane perpendicular to the c axis 
(the ‘c plane’) defines the most developed grain boundary elongated in 
the a direction, and as a > c > bat high T, where the b plane is the most 
developed grain boundary elongated in the a direction. For samples 
deformed in compression at T' > T,, it is difficult to estimate the for- 
sterite dimensions from CPO patterns alone. However, if we use the 
previously known euhedral shape of forsterite with its free surfaces 
(that is, the crystal-fluid or crystal-melt interfaces)”, then we predict 
that c > a> bat T’> T;,, where the b plane is the most developed grain 
boundary elongated in the cdirection, which is consistent with the 
observed CPO patterns and grain shape. 

We have identified anisotropic forsterite grains similar to those observed 
in high-T samples in the forsterite system with different chemical environ- 
ments. In calcium-free samples of forsterite plus enstatite (MgSiO3) at 
T = 1,360 °C, we observed equigranular grains similar those present at 
low T (see figure 1 in refs 8, 21), whereas in annealed samples at 1,400 °C 
we identified elongated grains (data not shown). In a system of forsterite 
plus MgO, we observed equigranular grains with almost random to weak 
CPO even at 1,450 °C, which respectively resemble the grain shape and 
CPO pattern seen at low T in this study’. In aggregates of forsterite plus 
diopside (plus water) at 1,200 °C and 1.2 GPa, very anisotropic (tabular) 
forsterite grains were formed”. All these results demonstrate that the 
equigranular or anisotropic forsterite grains form under different con- 
ditions in different forsterite-bearing systems. However, if we consider 
the experimental temperatures relative to T, for each system (Supplemen- 
tary Information), all the observed grain shapes are well explained by a 
transition temperature of ~0.92T, (in K), with equigranular and aniso- 
tropic grains appearing below and above 0.92T,, respectively. As a 
consequence, we can simply think of low T and high T as T < 0.92T, 
and 0.92T,< T'<T;, respectively, where T, changes with different 
geological conditions including different chemical environments. At 
present, we have not determined a physical explanation for this transi- 
tion temperature. However, it is reasonable that the shape transition is 
related to T,, because elements that can reduce T; are predicted to 
segregate to olivine grain boundaries”. Some elements, for example 
calcium in this study, are confirmed to partition into olivine bound- 
aries in a chemically equilibrated state'*. Structural transitions of the 
grain boundary including pre-melting at the boundary below T;, have 
also been identified in some materials** where such transition should 
occur within a scale of 1 nm width (Extended Data Fig. 4). 

Although our CPO results come from uniaxial deformation tests, we 
can estimate what type of CPO patterns will be formed during simple 
shear, which is more relevant to the deformation geometry in Earth’s 
interior. The type of fabric was previously classified for olivine depend- 
ing on flow and crystallographic geometry*. At high T, A-type fabric 
(that is, (010)[100], in which the olivine b plane is almost parallel to the 
shear plane and the a axis is almost parallel to the shear direction), which 
is the most common fabric found in nature, will be formed. At low T, 
despite its weak development, E-type fabric (that is, (001)[100]), which 
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has also been identified in nature”*”°, will be formed. At T > T,, we 
predict B-type fabric (that is, (010)[001]) on the basis of the grain shape 
observed at this condition. 


Discussion 


In Earth’s interior, changes in temperature from T < 0.92T, to T > T; 
occur mainly along adiabatic gradients during upwelling (that is, as pres- 
sure decreases) and as a result of changing chemical environments (for 
example, an increase in water content). Assuming (1) an adiabatic tem- 
perature gradient in the uppermost mantle of 0.5 K km” ', (2) a melting 
point, T, = 1,650-1,900 K, depending on the water concentration, and 
(3) geologic settings where upwelling is occurring, such as beneath a 
ridge or a plume, we estimate that the region in which T = 0.92T, will 
vary in depth from the point of initial melting to 50-100 km deeper. 

We apply our prediction to the asthenosphere beneath the Pacific 
basin, where the horizontal flow of the mantle starting from beneath 
the East Pacific Rise is well resolved by seismic tomography” (Fig. 5). 
Electrical conductivity measurements across the East Pacific Rise showed 
the bottom of the conductive region to have a depth of ~130 km (ref. 28). 
This depth is geochemically consistent with the initiation of silicate 
melting”. With the same assumptions as above, if T = T, at this depth 
then T = 0.92T, at ~210 km. We thus predict, first, that the region at 
130-210-km depth exhibits strong anisotropy with A-type fabric, which 
can produce radial and azimuthal seismic anisotropy with the fast axis 
almost parallel to the direction of mantle flow’ (that is, with the velocity 
of horizontally polarized shear waves greater than that of vertically 
polarized shear waves), and, second, that the intensity of the anisotropy 
increases with decreasing depth. Generally, a depth range of 130-210 km 
corresponds to the low-seismic-velocity zone, whose existence has been 
explained by thermal and grain size effects on seismic attenuation by 
extrapolating experimental results to mantle conditions*’; however, it 
has been questioned whether peridotite with the assumed grain size 
(that is, ~1 mm) could deform by dislocation creep at the conditions of 
the low-seismic-velocity zone, and it is claimed that a grain size >15 mm 
is necessary”. 

Our results showing the presence of a strong A-type fabric during 
diffusion creep may explain not only the observed seismic anisotropy 
but also the apparent contradiction between the presence of larger seismic 
attenuation and strong radial anisotropy at grain sizes commonly observed 
in nature**. At depths >210 km, the anisotropy becomes undetectable 
or very weak’, which has been explained to result from a switch from 
dislocation creep to diffusion creep” or from a pressure-induced transi- 
tion of an easy slip system of olivine crystal**~°. Our results showing a 
very weak E-type fabric at T < 0.92T, may account for the seismic cha- 
racteristics of this deeper region. In the shallow portion of the upper 
mantle with T > T,, we predict the presence of A- and B-type fabrics 
in melt-poor and melt-rich regions, respectively, because such a fabric 
transition requires a change in the ratio of crystal—crystal interfaces to 
crystal-melt interfaces in the rock. Our prediction is supported by pre- 
liminary deformation experiments in the presence of <1 vol.% melt 
(Extended Data Fig. 8). 

Recent seismological studies indicate the presence of melt layers at 
the lithosphere—asthenosphere boundary, where the rock is considered 
to lose grain-to-grain contact within the layers and to be essentially free 
of melt outside the layers’’. The presence of such layers was experimen- 
tally reproduced in sheared olivine-plus-basalt aggregates with develop- 
ment of olivine CPO in B- or AG-type fabric’® (AG-type is a mixture of A 
and B types). Grain growth is a thermally activated process, and so grain 
size will be almost maximized beneath the ridge and will remain constant 
during horizontal flow of the mantle, which will cool with distance from 
the ridge. Because grain growth is saturated, the grain shape will remain 
the same, and subsequent deformation will consequently produce a CPO 
determined by grain shape rather than by the deformation conditions, 
as in the experimental result of Extended Data Fig. 7. This prediction 
explains the ~210-km depth of the bottom of the anisotropic mantle 
beneath the Pacific very far from the East Pacific Rise (Fig. 5). 
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Figure 5 | Proposed depth distributions of olivine crystal shape and 
fabrics during diffusion-accommodated GBS creep of peridotite in the 
asthenosphere. Estimated depths corresponding to T, and 0.92T, for the East 
Pacific Rise (EPR) are added for comparison with the seismic anisotropy profile 
beneath the Pacific basin’’. Circular, tabular and hexagonal grains represent 


Because diffusion creep operates at finer grain size for a given stress 
and temperature, we may ask whether our observations are limited to 
very fine-grained aggregates such as those used in our study. Taking 
into account plausible stress and grain sizes in Earth’s mantle, olivine 
aggregates are predicted to deform either by diffusion or dislocation 
creep*’. Furthermore, a significant contribution of GBS to sample strain, 
sometimes referred as GBS creep irrespective of its rate-controlling 
process, often appears near the transition between diffusion and dis- 
location creep*. The effects of dynamic recrystallization during dis- 
location creep are easily identified in rocks; however, dislocation creep 
may be limited to very shallow portions of Earth’s interior, such as 
within the lithosphere, where stresses of > 10 MPa may drive deforma- 
tion. Historically, diffusion creep was believed to dominate in Earth’s 
interior owing to its occurrence at high temperature and low stress*"”. 
Soon after the discovery of the anisotropic nature of Earth’s interior, 
especially in the upper portion of the mantle, the importance accorded 
to diffusion creep decreased. We have reported experimental evidence 
showing that the mantle anisotropy can be formed during diffusion 
creep and that the predicted anisotropy based on our observed CPO 
does not contradict seismic observations. Diffusion creep is thus once 
again the favoured mechanism of mantle flow. 


METHODS SUMMARY 


Experimental samples were prepared through vacuum sintering of mineral powders 
with nanometre-sized grains. A uniaxial mechanical testing machine was used for 
creep tests. We used SEM and SEM EBSD to analyse sample microstructures and 
the crystallographic orientation of forsterite grains. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Mineral powders for sintering Fo+Di were prepared through solid-state reac- 
tions of nano-sized powders of Mg(OH)s, colloidal SiO. and CaCO; at 1,000 °C, 
and those for Fo+ An were prepared using Mg(OH),, colloidal SiO, CaCO; and 
Al,O3 at 800-1,000 °C (see details in refs. 21, 43). We applied a vacuum sintering 
technique at 1,150-1,280 °C to obtain aggregates with >99.9% density and a homo- 
geneous phase distribution. We changed time durations for the sintering to minimize 
grain size differences among the creep experiments, resulting in an almost constant 
grain size of ~0.4 jum for melt-free samples and, owing to faster grain growth, 10- 
20 um for melt-bearing samples (Extended Data Table 1). Introduction of diopside 
as a secondary phase helps to stabilize the microstructure through grain boundary 
pinning in melt-free samples”. 

The resultant calcined powders were cold-pressed at an isostatic pressure of 
200 MPa into 5mm X 10 mm X 30 mm bars and cylinders 8 mm in diameter and 
15 mm long. Subsequently, they were vacuum-sintered at 1,150, 1,200 or 1,280 °C 
depending on the temperature conditions for subsequent creep experiments. In 
general, the sintering temperatures and time duration were lower than those for 
subsequent creep experiments, to achieve equilibrium grain shape at each defor- 
mation temperature via grain growth. However, Fo+ Di samples deformed in tension 
were all sintered at 1,280 °C. The effect of the higher sintering temperature on CPO 
development in the tension experiments at 1,200 and 1,250 °C is limited, because 
grain growth that could help to change grain shape was detected in these experi- 
ments and the grains were, as a result, expected to acquire equilibrium grain shape 
at the experimental temperatures (Extended Data Table 1). The bar-shaped sintered 
aggregates for tension experiments were machined to a gauge length of 12 mm, a 
width of 2mm and a thickness of 2mm (Extended Data Fig. 5a). Samples were 
deformed ina uniaxial mechanical testing machine with a furnace attached. Silicon 
carbide rods consisting of two to three parts with flexible joints were used to align 
samples to the tensile geometry after a small amount of displacement. The samples 
for the compression experiments were pressed using SiC rods with Al,O3 and SiC 
spacers. Testing temperatures were established by increasing the temperature at 
650 °Ch_'. Tension experiments on melt-bearing samples and Fo+ Diat 1,150 °C 
were limited owing to sample failure at the beginning of the experiments. Strain 
was determined from the crosshead displacement by considering the compliance 
of the apparatus and by assuming uniform elongation in the gauge portion. Thus, 
we use true strain (¢) instead of nominal strain when discussing the strain effect on 
creep characteristics. The force was determined using a load cell attached to the 
crosshead of the testing machine. Data were collected every 1-2 s. All samples were 
quenched at a rate of 20°C min to preserve deformation microstructure. 

After the tests, all the samples were polished in the plane parallel to the tension 
or compression direction. Mechanochemical polishing with colloidal silica was 
used to prepare samples for EBSD analyses. A thermal grooving technique at 
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temperatures >100 °C lower than those used for the experiments was applied for 
0.5 h in air to expose grain and interphase boundaries to analyse grain size, shapes 
and phase distributions (Fig. 3). Microstructural changes during the thermal etch- 
ing were negligible. We did not apply any etching techniques to the specimens for 
transmission electron microscopy (TEM) observations. An ion slicer was used to 
obtain a thin section for TEM. We measured the diameter of each grain before and 
after deformation by approximating the grain shape to an ellipse using imaging 
software. The mean diameter of the ellipse was assumed to represent the grain size 
of the sample. Aspect ratios were determined from the long and short axes of the 
ellipse. The ratios were used in the analyses of the effect of grain shape on CPO inten- 
sity (Fig. 4). More than 170 grains were measured in each sample. 

We estimate grain size at any point of the deformation using a dynamic grain 
growth law, d, = d,eeexp(ae), where d, is grain size after a strain of ¢, d,.¢ is grain 
size in the reference sample and « is a coefficient of ~0.2 for melt-free samples”* 
in which the applicability of the law to forsterite system was confirmed. Here we 
use das an average grain size without distinguishing between phases. We estimated 
d,er from a static grain growth law, d”" — dy’ =kt, where k is a growth coefficient 
estimated from a fit of the observed grain sizes as a function of time (t) and m is a 
growth exponent characteristic of the growth process. We use m = 5 for melt-free 
samples, which best explains the observed grain growth under static conditions. 
This value was also obtained in our grain growth experiments of forsterite plus 
enstatite”’. The procedures used here to determine grain size during the deforma- 
tion experiments are well described in ref. 8. We estimated grain growth in melt- 
bearing samples by using a static grain growth law, because we did not observe 
deformation-induced grain growth in these samples (Extended Data Table 1). This 
result is predicted from the dynamic grain growth model, in which the growth is 
controlled by a reduction of the number of the boundary-pinning grains (that is, 
the secondary crystalline phase) such that dynamic growth is not expected in the 
forsterite-plus-melt system. We use m = 4 for melt-bearing samples, which best 
explains our observed grain growth. This value is consistent with previous results 
from a grain growth experiment on partly molten olivine“. All the estimated grain 
sizes (d, in Extended Data Table 1) were used to correct the flow stress to a grain 
size of 1 jm by imposing p = 2 (Fig. 1). This value is consistent with previously 
obtained values in the forsterite system at low stress””. Based on sample harden- 
ing and predicted grain size, our value of p contains a large uncertainty; however, 
nis still constrained to lie between 0.9 and 1.3 even with values of p ranging from 
1 to 3. 


43. Koizumi,S. etal. Synthesis of highly dense and fine-grained aggregates of mantle 
composites by vacuum sintering of mineral nano-powders. Phys. Chem. Miner. 
37, 505-518 (2010). 

44. Faul, U.H. & Scott, D. Grain growth in partially molten olivine aggregates. Contrib. 
Mineral. Petrol. 151, 101-111 (2006). 
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Extended Data Figure 1 | Typical stress-strain curve for a Fo+ Di sample. 
The result was obtained from the compression experiment on the sample 
KF-160 at a constant displacement rate (v). The CPO of forsterite for this 
sample is shown in Fig. 2. 
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Extended Data Figure 2 | The CPO of forsterite grains from tension (KS-16, ¢= 1.1; KS-17, ¢= 1.5). The CPO of the sample with the smallest 


experiments on Fo+Di samples at 1,330 °C with different amounts of strain strain (KS-13, ¢ = 0.6) is shown in Fig. 2. 
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Extended Data Figure 3 | Angles between crystallographic axes of forsterite samples, whereas only b axes are clearly perpendicular to the long axes in the 


and apparent long axes of forsterite grains in the Fo+Di samples. non-deformed samples. The CPO in b indicates that the highly anisotropic 
Crystallographic a, b and caxes were measured by SEM/EBSD, whereas the grains were well identified when the b axes of forsterite grains were parallel to 
long axes were determined from the grain shapes. Highly anisotropic grains the sample section when the grains were randomly oriented, whereas the CPO 


were selectively measured. N, number of measured grains. a, Angle frequencyin __ in d indicates that the highly anisotropic grains were well identified when the 
the sample statically annealed at 1,330 °C for 20h (NV-323). b, CPO of the baxes of the grains were perpendicular when the grains were preferentially 
grains analysed in a. c, Angle frequency in the sample compressed at 1,330°C _ oriented in the sample. Thus, we can conclude that the longest axis of each grain 
(KF-125). The angles were measured in sections cut perpendicularly to the was parallel to the a axis and that the second longest axis was parallel to the 
compression axis in the deformed sample. d, CPO of the grains analysed in caxis. The shortest grain axis should be parallel to the b axis. 

c. Note that b and caxes are perpendicular to the long axes in the deformed 
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Extended Data Figure 4 | TEM and high-resolution TEM images of a indicated by the arrow in a, showing that the boundary is parallel to the (010) 
Fo+Di sample after the compression experiment at 1,350 °C (KF-191). plane of the central forsterite grain. Bending contrasts and a circular contrast 
a, TEM image of multiple grains showing the large-scale structure of grain from beam damage are observed. No dislocations are identified. 


boundaries. b, High-resolution TEM image of the forsterite grain boundary 
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Extended Data Figure 5 | Specimens of Fo+Di. a, Before the tension deformation experiment. b, After the deformation experiment at 1,330 °C (KS-17) 
achieving ¢ = 1.5. The CPO of this sample is shown in Extended Data Fig. 2. 
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Extended Data Figure 6 | Schematic illustration of CPO formation during during GBS are removed and compensated, respectively, by atomic diffusion. 
GBS. Anisotropic grains rotate under the operation of GBS, where GBS is easy _ Easy GBS planes align in the flow direction followed by the grain rotation, 
on the long straight grain boundaries relative to the short grain boundaries resulting in the formation of CPO in our samples. 

(modified after ref. 16). Overlap and cavities formed at intergranular regions 
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Extended Data Figure 7 | The CPO of forsterite grains ina Fo+Disample deformed under compression (¢ = 0.6) at 1,200 °C. The initial grain size of this 
(KF-172). The sample was statically annealed at 1,330 °C for 10h and then sample before the creep test was already large (Extended Data Table 1). 
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Extended Data Figure 8 | Sample of spinel (0.5 vol.%)-doped forsterite forsterite; light grey, pyroxene. b, Lower-hemispheric projections of the 

(50 vol.%) plus Ca-bearing pyroxene deformed at ~1,320 °C with ¢= 1.0. crystallographic orientations of forsterite grains measured by SEM/EBSD. N, 
a, Secondary electron image of the sample. A very small amount of melt number of measured grains. Both grain shape and CPO resemble that observed 
(~1 vol.%; black) is present, mostly at triple-grain junctions. Dark grey, at high T. 
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Extended Data Table 1 | Experimental data 


ion eens OT v t E(t d d, de(t) i (t t) &(t,d=1ym) LPO 
Composition «EF: i e (t) £ (t) bo in er (t) (tf) o(t) elt, yim) L Fo — 
KS: — (K) (mm/s) (sec) (/sec) (um) (um) (um) (um) (MPa) (/sec) J-index aspect ratio 
Fo+Di KF-171 1423 1.7E-04 779 0.02 1.9E-05 0.25 0.31 0.28 0.28 18.6 1.5E-06 2.16 1.35 
KF-158 1473 1.7E-04 3720 0.06 1.9E-05 0.39 0.46 0.40 0.41 48.0 3.1E-06 2.21 1.41 


KF-107 1473 1.2E-03 960 0.09 1.3E-04 0.41 0.44 0.41 0.41 300.0 2.2E-05 2.03 1.39 


1473 1.7E-03 1200 0.13 1.9E-04 0.41 - 0.41 0.42 418.0 3.4E-05 - - 
1473 1.0E-03 1530 0.18 1.2E-04 0.41 - 0.41 0.42 313.0 2.1E-05 - - 
1473 5.0E-04 2570 0.28 6.7E-05 0.41 - 0.41 0.42 230.0 1.2E-05 - - 
1473 1.7E-04 3870 0.37 2.4E-05 0.41 - 0.41 0.43 108.0 4.5E-06 - - 


KF-157 1523 1.7E-04 1674 0.10 1.9E-05 0.39 0.54 0.40 0.48 9.0 4.5E-06 2.04 1.39 
KF-106 1523 6.7E-04 671 0.04 6.6E-05 0.41 0.57 0.42 0.42 32.5 1.2E-05 2.02 1.37 


1523 1.3E-03 1431 0.10 1.4E-04 0.41 - 0.42 0.44 71.7 2.7E-05 - - 
1523 2.0E-03 1639 0.14 2.2E-04 0.41 - 0.43 0.44 106.0 4.3E-05 - - 
1523 3.3E-03 1836 0.21 3.9E-04 0.41 - 0.43 0.45 192.0 8.0E-05 - - 
1523 1.0E-03 2137 0.27 1.2E-04 0.41 - 0.43 0.46 77.3 2.7E-05 - - 
1523 6.7E-04 2639 0.33 8.8E-05 0.41 - 0.44 0.47 58.0 2.0E-05 - - 
1523 3.3E-04 5339 0.42 4.8E-05 0.41 - 0.46 0.51 39.7 1.2E-05 - - 
1523 1.0E-03 5930 0.49 1.6E-04 0.41 - 0.46 0.52 114.0 4.2E-05 - - 
KF-190 1573 1.7E-04 1625 0.03 1.8E-05 0.39 0.75 0.47 0.48 3.2 4.2E-06 4.56 1.44 
KF-108 1573 3.3E-04 1550 0.03 3.3E-05 0.41 0.79 0.47 0.48 5.1 7.5E-06 3.37 1.40 
1573 6.7E-04 4457 0.15 7.4E-05 0.41 - 0.54 0.56 13.1 2.3E-05 - - 
1573 1.3E-03 4777 0.18 1.5E-04 0.41 - 0.55 0.57 22.3 5.0E-05 - - 
1573 2.0E-03 5054 0.23 2.4E-04 0.41 - 0.55 0.58 34.0 8.2E-05 - - 
1573 1.2E-03 5354 0.29 1.5E-04 0.41 - 0.56 0.60 25.7 5.3E-05 - - 
1573 1.7E-04 9090 0.88 3.8E-05 0.41 - 0.61 0.75 12.4 2.2E-05 - - 


KF-160 1603 1.7E-04 555 0.01 1.8E-05 0.39 0.99 0.46 0.48 1.5 4.2E-06 5.09 1.47 
KF-125* 1603 1.3E-03 2326 0.07 1.4E-04 0.41 0.84 0.56 0.57 8.3 4.3E-05 6.55 1.45 


1603 6.7E-03 2643 0.16 7.4E-04 0.41 - 0.57 0.59 37.0 2.6E-04 - - 
1603 1.7E-03 2938 0.26 2.0E-04 0.41 - 0.58 0.62 11.8 7.8E-05 - - 
1603 3.3E-04 4505 0.40 4.7E-05 0.41 - 0.62 0.69 5.4 2.2E-05 - - 
1603 6.7E-04 6225 0.46 1.0E-04 0.41 - 0.66 0.74 10.3 5.4E-05 - - 
1603 1.3E-03 6760 0.52 2.1E-04 0.41 - 0.67 0.76 19.5 1.2E-04 - - 
KF-191 1623 1.7E-04 3525 0.07 2.0E-05 0.39 1.05 0.66 0.69 1.0 9.3E-06 6.00 1,51 
KF-137 1623 3.3E-03 500 0.07 3.6E-04 0.41 0.94 0.49 0.50 9.8 8.9E-05 11.79 1.49 
1623 6.7E-03 830 0.20 8.1E-04 0.41 - 0.52 0.54 21.6 2.4E-04 2 a 
1623 1.0E-02 920 0.33 1.4E-03 0.41 - 0.53 0.56 39.2 4.4E-04 > Bs 
1623 1.7E-03 1015 0.39 2.5E-04 0.41 - 0.54 0.58 9.4 8.2E-05 - - 
1623 3.3E-03 1400 0.51 5.6E-04 0.41 - 0.57 0.62 19.2 2.1E-04 - - 
1623 1.0E-03 1600 0.58 1.8E-04 0.41 - 0.58 0.64 8.0 7.3E-05 = = 
KF-172 1473 1.7E-04 26685 0.62 3.2E-05 0.86 0.88 1.01*%* 0.88 181.6 2.5E-05 3.41 1.eo"" 
Fo+Di (molten) KF-209 1658 1.7E-03 2580 0.11 1.7E-04 - 21.80 18.30** = 0.3 - 2.09 1.36 
Fo+Di KS-8 1473 1.7E-04 61835 0.62 7.5E-06 0.41 0.54 O51" = 48.9 - 1.96 1.39 
KS-10 1523 1.7E-04 61845 0.62 7.5E-06 0.41 0.76 0.73** _ 29.2 - 1.91 1.41 
KS-12 1573 2.2E-04 48000 0.62 9.7E-06 0.41 1.02 0.96** = 26.2 - 2.31 1.46 
KS-13* 1603 2.2E-04 48000 0.62 9.7E-06 0.41 1.22 1:20" . 17.7 - 2.81 1.54 
KS-14 1623 2.2E-04 48020 0.62 9.7E-06 0.41 1.24 his . 9.81 - 3.36 1.58 
KS-16 1603 7.5E-04 34080 1.14 2.0E-05 0.41 1.33 0.94** “ 26.0 - 512 1.56 
KS-17* 1603 8.3E-04 48060 1.47 1.4E-05 0.41 1.40 - = 37.4 - 6.88 1.54 
NV-323 1603 - 72000 0.00 - 0.41 - 1.06** = - - - - 
Fo+An-melt KF-186 1533 - 76563 0.37 4.8E-06 = 9.26 10.33" 9.25" 0.2 4.1E-04 2.19 1.31 
KF-202 1533 5.0E-04 650 0.03 5.0E-05 7.07 7.23 7.07 7.07 0.8 2.5E-03 - 1.32 
1533 4.2E-04 900 0.04. 4.2E-05 7.07 - 7.07 7.07 0.7 2.1E-03 - - 
1533 2.5E-04 1300 0.05 2.5E-05 7.07 - 7.07 7.07 0.5 1.3E-03 - - 
1533 8.3E-05 2000 0.06 8.5E-06 7.07 - 7.08 7.08 0.2 4.3E-04 - - 


* 


Data were partially reported in ref. 14 
** Observed grain size 
*** Initial appect ratio 
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KF, compression experiments; KS, tension experiments; NV, statically annealed experiments; v, displacement rate; t, experimental time; «(#), strain at t; é (6), strain rate at t; do, initial grain size (t= 0); drer(t), 
estimated grain size in reference (and/or statically annealed) samples at t; a(t), stress at t; dyin, final grain size observed in deformed samples; d,, estimated grain size in deformed samples; é (t, d=1 jum), corrected 
strain rate at t and grain size of 1 pm. 
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A regenerative approach to the treatment 


of multiple sclerosis 


Vishal A. Deshmukh!, Virginie Tardif’, Costas A. Lyssiotis't, Chelsea C. Green’, Bilal Kerman’, Hyung Joon Kim’, 
Krishnan Padmanabhan®, Jonathan G. Swobodal, Insha Ahmad", Toru Kondo’, Fred H. Gage®, Argyrios N. Theofilopoulos’, 


Brian R. Lawson", Peter G. Schultz>°* & Luke L. Lairson>>* 


Progressive phases of multiple sclerosis are associated with inhibited differentiation of the progenitor cell population that 
generates the mature oligodendrocytes required for remyelination and disease remission. To identify selective inducers 
of oligodendrocyte differentiation, we performed an image-based screen for myelin basic protein (MBP) expression 
using primary rat optic-nerve-derived progenitor cells. Here we show that among the most effective compounds 
identifed was benztropine, which significantly decreases clinical severity in the experimental autoimmune encephalo- 
myelitis (EAE) model of relapsing-remitting multiple sclerosis when administered alone or in combination with 
approved immunosuppressive treatments for multiple sclerosis. Evidence from a cuprizone-induced model of demyelin- 
ation, in vitro and in vivo T-cell assays and EAE adoptive transfer experiments indicated that the observed efficacy of this 
drug results directly from an enhancement of remyelination rather than immune suppression. Pharmacological studies 
indicate that benztropine functions by a mechanism that involves direct antagonism of M1 and/or M3 muscarinic 
receptors. These studies should facilitate the development of effective new therapies for the treatment of multiple 


sclerosis that complement established immunosuppressive approaches. 


Remyelination persists throughout adulthood in the central nervous sys- 
tem and involves the generation of new myelinating oligodendrocytes’. 
Despite some controversy regarding their intrinsic in vitro and in vivo 
lineage potential’~*, compelling evidence indicates that a widespread 
proliferating population of nerve and glial antigen-2 (NG2), platelet- 
derived growth factor receptor alpha (PDGFR-«) positive cells, termed 
NG2-glia or oligodendrocyte precursor cells (OPCs), are the major source 
of newly formed mature oligodendrocytes required for remyelination*”’. 
Remission in multiple sclerosis is largely dependent on migration of 
OPCs to sites of injury and subsequent differentiation to mature cells 
capable of repair’**. Studies evaluating the presence and relative densi- 
ties of OPCs at sites of chronically demyelinated multiple sclerosis 
lesions indicate that it is not a failure of repopulation or migration of 
OPCs, but rather inhibition of OPC differentiation at sites of injury 
that contributes to disease progression’'’. As such, the identification 
of small molecules that selectively induce differentiation of OPCs at 
sites of demyelinated lesions and thereby enhance remyelination 
would have a considerable impact on the development of new effective 
treatments for multiple sclerosis’’. 


High-throughput OPC differentiation screen 


To identify drug-like small molecules that selectively induce OPC 
differentiation, we developed a high content imaging assay based on 
the induction of MBP expression in primary rat optic nerve-derived 
OPCs cultured for 6 days under basal differentiation conditions. Primary 
rodent OPCs proliferate in vitro when cultured in serum-free media 
containing PDGF-AA™. Upon withdrawal of PDGF-AA, immature 
A2B5* OPCs cease to proliferate, but also fail to efficiently differentiate 
into MBP producing mature oligodendrocytes. Addition of thyroid hor- 
mone (triiodothyronine; T3), a known inducer of OPC differentiation’*”, 


at the time of mitogen withdrawal results in the differentiation of OPCs 
to MBP-positive oligodendrocytes after 6 days of culture (Extended 
Data Fig. 1a). However, T3 has several physiological effects that make 
it unattractive as a therapeutic agent for multiple sclerosis. This assay 
was adapted to a high-throughput format and used to screen a collection 
of ~100,000 structurally diverse molecules (Extended Data Fig. 1b). 
This led to the identification of several previously identified inducers of 
OPC differentiation’ (Extended Data Fig. 1c, summarized in Sup- 
plementary Table 1). Unfortunately, these molecules have limited thera- 
peutic potential due to off-target activities, toxicity, poor brain exposure 
and/or demonstrated lack of in vivo efficacy. Among the most effective 
inducers of OPC differentiation was benztropine (half-maximum effec- 
tive concentration (ECs9) ~500 nM) (Fig. 1a and Extended Data Fig. 2a, b), 
which we chose to investigate further because it is an orally available 
approved drug that readily crosses the blood-brain barrier. 
Benztropine-induced in vitro differentiation of rodent OPCs was 
confirmed by evaluating the transcription and translation levels of the 
oligodendrocyte-specific markers MBP and myelin oligodendroglial 
glycoprotein (MOG) by western blot and quantitative polymerase chain 
reaction with reverse transcription (qRT-PCR) analysis (Extended 
Data Fig. 2c, d). Additionally, in vitro OPC differentiation activity 
was confirmed by immunofluorescence analysis using multiple mar- 
kers specifically expressed in mature oligodendrocytes following 6 days 
of compound treatment (Extended Data Fig. 2e). Furthermore, tran- 
script levels of cyclin D1, cyclin D2, c-Fos and c-Jun were significantly 
decreased in benztropine-treated OPCs, consistent with general inhi- 
bition of cell cycle progression (Extended Data Fig. 2f). To determine 
the stage of OPC differentiation at which benztropine is active**”*, we 
treated OPCs for differing durations starting at several time points 
(Extended Data Fig. 2g, h). Maximal induction of MBP expression 
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Figure 1 | Benztropine induces oligodendrocyte precursor cell 
differentiation and in vitro myelination of co-cultured axons. a, Benztropine 
(1.5 uM)- and T3 (1.0 uM)-treated rat OPCs immunostained for MBP (green) 
and 4’,6-diamidino-2-phenylindole (DAPI, blue). Structure of benztropine. 
b, Benztropine (1.0 1.M)- and T3 (1.0 uM)-treated co-culture of mouse 
embryonic-stem-derived neurons with rat OPCs immunostained for TUJ1 
(tubulin marker, axons), MBP (oligodendrocytes) and DAPI (nuclei). c, Effect 
of benztropine (1.0 |£M) treatment on the myelination of axons. Arrows denote 
myelinated axons. d, Quantification of total axonal myelination in OPC with 
neuron co-cultures (n = 10, mean and s.e.m., ***P < 0.001, ANOVA with 
Bonferroni correction). 


was observed when the compound was added within 48 h of PDGF-AA 
withdrawal and cells were further cultured for at least 5 days, indi- 
cating that this drug probably acts on immature A2B5* OPCs and 
not the intermediate ‘pre-oligodendrocyte’ stage of differentiation. 
Benztropine was also found to induce robust differentiation of rat 
and mouse OPCs when co-cultured with mouse embryonic-stem-cell- 
derived neurons” or mouse cortex-derived cells” (Fig. 1b and Extended 
Data Fig. 3a, b, respectively). We also quantified the effect of benztropine 
on the in vitro myelination of axons by quantifying the co-localization 
of MBP positive oligodendrocyte processes and axons (Extended Data 
Fig. 3c). The results showed a significant increase in the absolute amount 
of myelination in benztropine-treated co-cultures (Fig. 1c, d). This 
increase could result from enhanced maturation of oligodendrocytes 
and/or may reflect elevated myelination capacity of mature oligoden- 
drocytes. To evaluate these two possibilities, we normalized the absolute 
amount of myelin to the total number of oligodendrocytes. Benztropine 
treated cultures had a significantly higher percentage of myelinating 
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oligodendrocytes (Extended Data Fig. 3d), which indicates that benz- 
tropine not only enhances maturation of oligodendrocytes, but also 
promotes myelination. 


M1/M3 muscarinic receptor antagonism 


Benztropine is used clinically for the management of Parkinson’s 
disease and its pharmacological effects are thought to result from its 
anticholinergic activity. However, benztropine is also a centrally 
acting anti-histamine” and dopamine re-uptake inhibitor*’. To deter- 
mine which, ifany, of these activities play a role in OPC differentiation, 
we evaluated the ability of selective agonists of muscarinic acetylchol- 
ine receptors (mAChRs) or nicotinic acetylcholine receptors (the ago- 
nists carbachol or nicotine, respectively) to block benztropine activity. 
Inhibition of benztropine-induced OPC differentiation was observed 
in the presence of carbachol (Extended Data Fig. 3e, f), whereas nic- 
otine had no effect on OPC differentiation (Extended Data Fig. 3g). 
The dopamine receptor antagonist haloperidol, the dopamine receptor 
agonist quinpirole and the histaminergic receptor agonists histamine 
and histamine trifloromethyl-toluidine (HTMT) had no effect on 
benztropine-induced OPC differentiation (Extended Data Fig. 3h-k). 
Moreover, neither quinpirole nor nicotinic receptor antagonists (for 
example, tubocuraine, mivacurium, mecamylamine, pancuronium, 
atracurium or trimethophan) induced significant OPC differentiation 
(Extended Data Fig. 31). We then evaluated a panel of mAChR antago- 
nists (atropine, oxybutynin, scopolamine, ipratropium and propiverine) 
and found that all induced OPC differentiation in a dose-dependent 
manner with differing potencies (Supplementary Table 2), consistent 
with a mechanism of action that is dependent on muscarinic receptor 
antagonism. 

To examine further the role of muscarinic receptor antagonism in 
benztropine-induced OPC differentiation and determine if a more 
potent and/or clinically useful drug could be identified, we evaluated 
a broader panel of structurally diverse muscarinic receptor antago- 
nists. Of the 42 compounds tested, 20, which cluster amongst 4 related 
structural classes, were found to be active (Supplementary Tables 2 
and 3). However, none were found to be more potent than benztro- 
pine. The ability of muscarinic receptor agonists to inhibit benztropine- 
induced OPC differentiation provides strong evidence that muscarinic 
receptor antagonism is an essential component of the mechanism of 
action. The inactivity of several of the muscarinic antagonists we eval- 
uated could be the result of off-target inhibitory activities or toxicity- 
related effects. We cannot rule out the possibility that an additional 
biological activity, common among the active muscarinic receptor 
antagonists identified, is required for OPC differentiation. However, 
consistent with the proposed mechanism, it has recently been demon- 
strated that muscarinic activation causes decreased expression of mye- 
lin proteins in mature oligodendrocytes and modulates the expression 
of known regulators of differentiation (for example, PDFGR«) in 
immature OPCs*'. Furthermore, Notch] is a known negative regulator 
of OPC differentiation*”** and, consistent with the observation that 
muscarinic receptor activation causes increased expression of Notch1 
(ref. 31), we found that treatment of OPCs with benztropine results 
in a significant decrease in Notch expression in immature OPCs 
(Extended Data Fig. 3m). 

OPCs are known to express mAChRs, predominantly subtypes M1, 
M3 and M5 (ref. 34). We confirmed expression of these receptors, as 
well as that of the acetylcholine-synthesizing enzyme choline acetyl 
transferase (ChAT), by qRT-PCR (Extended Data Fig. 3n, 0). Acti- 
vation of mAChRs triggers protein-kinase-C-dependent activation of 
the MAPK/ERK pathway leading to modulation of c-Fos expression”. 
Western blot analysis of benztropine-treated OPCs is consistent with 
general inhibition of this pathway (decreased phospho-Akt and sti- 
mulated phosphorylation of p38 MAPK and CREB) (Extended Data 
Fig. 3p). Activation of M1 and M3 mAChRs is coupled to downstream 
signalling events through phospholipase C, which results in increased 
intracellular calcium concentrations*’, whereas M2 and M4 mAChR 
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activation inhibits adenylate cyclase, leading to decreased intracellular 
cAMP levels*'. In OPCs, benztropine and the muscarinic antagonist 
atropine both inhibited carbachol-induced calcium influx (Extended 
Data Fig. 3q), but had no effect on cAMP levels (Extended Data Fig. 3r). 
Together, these results suggest that benztropine induces OPC differ- 
entiation by a mechanism involving direct antagonism of M1/M3 
muscarinic receptors. Acetylcholine is a known regulator of OPC prolife- 
ration and, as such, muscarinic receptor subtypes represent a promising 
class of therapeutic targets for the modulation of OPC proliferation 
and differentiation®**’. 


Efficacy in the PLP-induced EAE model 


We next examined the activity of benztropine in the proteolipid protein 
(PLP)-induced EAE rodent model of relapsing-remitting multiple 
sclerosis *°*’. Despite some inherent limitations of this model, all ther- 
apies approved for the treatment of MS decrease the clinical severity of 
EAE*’. This model is most commonly used to evaluate the potential 
efficacy of immunosuppressive agents, but can also be used to deter- 
mine the effectiveness of promyelinating agents that function by 
enhancing OPC differentiation***’. Benztropine (10 mg per kg) was 
administered prophylactically by a daily intraperitoneal (i.p.) injection 
regimen initiated at the onset of PLP immunization. Benztropine dra- 
matically decreased the severity of the acute phase of disease and 
virtually eliminated the relapse phase compared to vehicle-treated 
controls (Fig. 2a and Extended Data Fig. 4a). We next evaluated efficacy 


ais, = Vehicle 
m= Benztropine (Thr; 10 mg per kg) 
—e- Benztropine (Pro;10 mg per kg) 
© 44 —«— FTY720 (Thr; 1 mg per kg) 
8 —= Interferon-B (Thr; 10,000 U) 
< 
ou 34 
ro 
io 
Cc 
6 24 
= 
G 
® 
= 14 
0 : 
30 
Days 
Prophylactic Therapeutic 
b Vehicle Benztropine c 
“P< 0.05 
1,500 
8 1,000 
— 
200'41m 200 4m 2 
8 500} 7 
GST-1n 
Vehicle 


HB Benztropine 


DAPI 


HGSTx MNG2 


Figure 2 | Benztropine decreases disease severity in the PLP-induced EAE 
model. a, EAE severity scores (ranging from no observable disease to 
moribund/dead) following prophylactic (Pro, day 0) or therapeutic (Thr, time 
of initial symptoms) treatment with benztropine compared to therapeutically 
administered FTY720 or interferon-B (n = 8, mean and s.e.m., 

*P < 0.05; t-test). b, Confocal images of spinal cord sections isolated at day 14 
from EAE mice treated prophylactically with benztropine (10 mg per kg) or 
vehicle and immunostained for GST-z (mature oligodendrocytes) and NG2. 
c, Quantification of GST-1* cells (n = 30, mean and s.e.m., *P < 0.05, t-test). 


ARTICLE 


when the drug was administered therapeutically, by starting daily injec- 
tions at the first sign of disease. Treatment with benztropine in this 
mode again led to functional recovery; significant decreases in clinical 
severity during remission phases were observed and the occurrence of 
relapse was again virtually eliminated (Fig. 2a). In fact, treatment with 
benztropine in this mode resulted in decreased clinical severity compa- 
rable to, or better than, that observed for the immunosuppressive mul- 
tiple sclerosis drugs FTY720 or interferon-B (administered at reported 
therapeutic doses in mice) (Fig. 2a). 

In parallel experiments, when benztropine was dosed in a prophy- 
lactic mode, we isolated spinal cords from drug- or vehicle-treated mice 
at various time points before the onset of symptoms and during the 
acute and relapse phases of disease. Sections from many regions were 
stained with Luxol fast blue to visualize myelin or Luxol fast blue and 
hematoxylin and eosin (H&E) to visualize both myelin and infiltrating 
immune cells. Immunohistochemical analysis was also performed 
using antibodies that recognize markers of mature oligodendrocytes 
(MBP and glutathione S-transferase; GST-1), immature OPCs (NG2) 
or infiltrating immune cells (CD45). During acute and relapse phases 
of disease, sections from both vehicle- and benztropine-treated mice 
showed significant infiltration by H&E- and CD45-positive immune 
cells (Extended Data Fig. 4b-d). In vehicle-treated mice, infiltration 
corresponded to areas of significant demyelination (Extended Data 
Fig. 4b, e). In contrast, in benztropine-treated mice, a large number 
of immune-cell-infiltrated areas stained positive for Luxol fast blue or 
MBP, a finding consistent with a regenerative versus immunosuppressive 
mechanism (Extended Data Fig. 4b, e). We further evaluated drug- 
enhanced remyelination using confocal microscopy (Fig. 2b). Quanti- 
tative image analysis of many random fields per group indicated that 
benztropine treatment caused a significant increase in the number of 
GST-x* mature oligodendrocytes from ~500 to ~1,100 per field com- 
pared to vehicle (Fig. 2c and Extended Data Fig. 4d). The observed 
increase in mature oligodendrocyte numbers is consistent with a 
mechanism of benztropine-induced clinical recovery that involves 
the stimulation of OPC differentiation, leading to enhanced remyeli- 
nation, in the context of an inflammatory environment. Notably, at 
time points before any observable immune cell infiltration or disease 
onset (day 8), a similar (~twofold) increase in the number of mature 
oligodendrocyte was observed in the spinal cords of benztropine- 
treated mice (Extended Data Fig. 4c, d). This observation is consistent 
with the time frame of in vitro activity. Furthermore, it is consistent with 
the occurrence of benztropine-induced OPC differentiation in vivo in 
the absence of inflammatory insult and provides an explanation for the 
substantial decrease in clinical severity observed during the acute phase 
of disease when the drug is dosed in a prophylactic mode. Mature oli- 
godendrocytes capable of remyelination are poised for repair (or possibly 
protection) before immunological attack. Importantly, general toxicity 
was not observed either microscopically or macroscopically in drug- 
treated mice following 4 weeks of daily injections at 10 mg per kg. 

Electron microscopy was used to observe myelin surrounding spinal 
cord axons in benztropine- and vehicle-treated mice during the peak of 
the acute phase of disease. Immune-cell-infiltrated areas of spinal cords 
from vehicle-treated mice exhibited characteristic oligodendrogliopa- 
thy along with damaged axons with loose and separated layers of myelin 
sheaths” (Fig. 3a, b and Extended Data Fig. 4f). Benztropine treatment 
did not influence the infiltration or relative abundance of encephalo- 
genic T cells and other inflammatory immune effector cells, nor did it 
affect the ability of these cells to cause demyelination during the acute 
phase of disease. Evidence for demyelination at this time point is 
provided by the significant (P< 0.01) increases in observed g-ratios 
(ratio of axon diameter to myelinated axon diameter) for both drug- 
and vehicle-treated mice compared to non-diseased controls (Fig. 3c 
and Extended Data Fig. 4g-i). Drug treatment resulted in extensive 
remyelination, as evidenced by the presence of abundant newly formed 
and notably thinner (compared to those of non-diseased mice) myelin 
sheaths (a characteristic associated with remyelination’) (Fig. 3a, b and 
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Figure 3 | Benztropine-induced remyelination in the PLP-induced EAE 
model. a, b, Electron microscopy images of spinal cords isolated from 
benztropine (Pro, 10 mg per kg)- and vehicle-treated EAE mice. ¢, g-ratios of 
spinal cord axons in normal and EAE mice (n = 1,000, mean and s.e.m., 
***P <().001, two-way ANOVA). d, Electron microscopy images of spinal 
cords isolated from benztropine-treated EAE mice highlighting different 
phases of remyelination (initial wrapping (Ax1), partial remyelination (Ax2), 
almost remyelinated (Ax3) and normal axon (Ax4)) and associated 
morphological features (outer (Oo) and inner (Oi) ends of cytoplasmic 
processes of oligodendrocytes wrapped around axons (Ax)). 


Extended Data Fig. 4f). Specifically, the g-ratios in benztropine-treated 
mice were significantly lower (P < 0.01) than in vehicle-treated mice, yet 
still significantly higher (P < 0.01) than in non-diseased mice (Fig. 3c 
and Extended Data Fig. 4g-i). Moreover, in these areas, myelin sheaths 
were observed in different phases of remyelination (axons undergoing 
initial wrapping, partial remyelination and almost complete remyeli- 
nation”) (Fig. 3d and Extended Data Fig. 4j), strongly supporting a 
benztropine-induced remyelination effect. 


Effect on the immune system of EAE mice 


The primary immunological processes involved in multiple sclerosis 
and EAE are thought to be T-cell mediated. To determine the extent to 
which the efficacy of benztropine in the EAE model results from T cell 
inhibitory activity, we evaluated the effects of benztropine and other 
muscarinic antagonists (atropine, oxybutynin, scopolamine, ipratro- 
pium and propiverine) on T-cell activation and proliferation. Notably, 
it has been reported that muscarinic receptors are expressed on T 
cells**. However, neither benztropine nor the other muscarinic antago- 
nists had an effect on T-cell proliferation in vitro, as measured using a 
carboxyfluorescein succinimidyl ester (CFSE) labelling assay (Extended 
Data Fig. 5a), nor did they affect T-cell activation as determined by 
assessing CD4*CD69* and CD4*CD25* populations (Extended 
Data Fig. 5b, c). We also evaluated the effect of benztropine on the 
immune system in vivo in SJL/J mice in which EAE had or had not been 
induced with PLP. In both diseased and healthy animals, benztropine 
had no effect on the number of splenic naive (cpD44"°) or activated 
(CD44™) CD4* and CD8* T cells (Extended Data Fig. 5d-f). A minor, 
but significant, decrease in B cell numbers was observed following 
treatment with benztropine (Extended Data Fig. 5e, f), and benztropine 
had no effect on cytokine (IL-2, IL-10, IFN-y, TNF-«) producing T-cell 
populations isolated from drug- or vehicle-treated normal or diseased 
mice (Extended Data Fig. 5e, f). We also found that benztropine had no 
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effect on antigen-induced IgG production (Extended Data Fig. 5g). 
Next, we analysed the effect of benztropine on in vitro macrophage 
development and M1/M2 polarization and observed no significant 
differences between benztropine-treated cultures and controls (Extended 
Data Fig. 6). Additionally, the expression of M1 or M2 polarization 
markers in spleens, spinal cords, as well as spinal-infiltrating leuko- 
cytes isolated from EAE mice treated with benztropine for 14 days, 
were equivalent to those of controls (Extended Data Fig. 7). 

We next performed the adoptive transfer model of EAE™ to further 
demonstrate that benztropine does not affect T-cell development or 
function. Splenocytes were isolated from SJL/J donor mice, following 
immunization with PLP and immediate daily injection with benztro- 
pine (10 mg per kg) or saline for 10 days, and then cultured in vitro in 
the presence of benztropine (5 UM) or vehicle, respectively (Extended 
Data Fig. 8). Injection of splenocytes from either donor group resulted 
in equally severe transfer of EAE disease to recipient mice (Extended 
Data Fig. 8: groups 1 and 3), providing evidence that benztropine had 
no effect on the development and function of PLP-specific T cells 
in vivo or in vitro. Benztropine (10 mg per kg) treatment of recipient 
mice, injected with splenocytes from vehicle- or benztropine-treated 
donor mice, resulted in significant reductions in clinical severity 
scores (Extended Data Fig. 8: groups 2 and 4) compared to controls 
(Extended Data Fig. 8: groups 1 and 3). Taken together, these results 
strongly suggest that the observed efficacy of benztropine in the EAE 
model results from remyelination arising from enhanced oligoden- 
drogenesis rather than immune suppression. 


Efficacy of benztropine in the cuprizone model 


We further evaluated the ability of benztropine to induce OPC differenti- 
ation and enhance remyelination in vivo using the T-cell-independent 
cuprizone-induced model of demyelination. Inclusion of 0.2% (w/w) 
cuprizone in the diet of C57BL/6 mice induces a demyelination pro- 
gram that proceeds with a defined series of events over a characteristic 
time course in which the corpus callosum shows peak demyelination 
following 6-7 weeks**. Spontaneous remyelination is observed 2-4 weeks 
following cuprizone withdrawal**. By administering drugs when a 
cuprizone-free diet is reintroduced, the efficacy of promyelinating 
agents can be examined by evaluating the relative kinetics of OPC- 
dependent remyelination***. Upon withdrawal of cuprizone, we 
administered vehicle or benztropine (10 mg per kg) by daily i-p. injec- 
tions for 5 weeks. During this time, mice were euthanized weekly and 
the degree of remyelination was quantitatively established by staining 
the corpus callosum regions of harvested brains with Luxol fast blue 
and anti-GST-n* antibody. Significant demyelination was clearly 
observed after seven weeks of cuprizone treatment compared to naive 
animals (Fig. 4a, top left and bottom left panels). Consistent with 
enhanced OPC differentiation and accelerated remyelination, overall 
myelin staining (Fig. 4a, middle and right panels; 4b; Extended Data 
Fig. 9) and the number of GST-x* mature oligodendrocytes (Fig. 4c, d) 
increased significantly in the corpus callosum at week 2 following 
benztropine treatment compared to the spontaneous remyelination 
observed in vehicle controls. As expected, at later time points, spon- 
taneous remyelination was relatively complete and no significant dif- 
ferences between drug- and vehicle-treated animals were observed. A 
lack of difference at these later time points indicates that, even after five 
weeks of treatment, benztropine is not toxic to mature oligodendro- 
cytes. These data again indicate that benztropine enhances the process 
of in vivo remyelination by directly inducing OPC differentiation. 


Benztropine is dose-sparing with FTY720 

For the treatment of multiple sclerosis, an OPC differentiation-inducing 
drug would most probably be introduced clinically as part of a com- 
bination therapy with an immunosuppressive drug. Using the PLP- 
induced EAE model, we therefore evaluated the clinical efficacy of 
benztropine when combined with either of two immunosuppressive 
drugs approved for the treatment of multiple sclerosis, interferon-f 
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Figure 4 | Benztropine treatment enhances remyelination in the cuprizone 
model. a, Sections from the corpus callosum region of brains isolated from 
either benztropine- or vehicle-treated mice stained with Luxol fast blue. 

b, Quantification of Luxol fast blue (LFB) staining (n = 6 images each from 4 
mice per group, mean and s.d., *P < 0.05, t-test). c, Confocal microscopy 
images of sections from the corpus callosum region of brains isolated from 
either benztropine- or vehicle-treated mice immunostained for GST-1. 

d, Quantification of GST-n* mature oligodendrocytes. (n = 20 images each 
from 4 mice per group, mean and standard deviation, **P < 0.01, t-test). 


and FTY720 (refs 47, 48). The former reduces T-cell proliferation and 
alters cytokine expression”, whereas the latter is an S1P agonist that 
blocks T-cell trafficking*. Initially, all three drugs were dosed indi- 
vidually over a range of concentrations to determine suboptimal and 
maximal effective/tolerated doses in the EAE model (Extended Data 
Figs 4a and 10a, b). Addition of 2.5 mg per kg of benztropine (sub- 
optimal dose) to reported therapeutic doses of FTY720 (1 mg per kg) 
(Extended Data Fig. 10c) or interferon-f (10,000 U per mouse) (Extended 
Data Fig. 10d) resulted in decreased clinical severity comparable to, or 
greater than, either FTY720 or interferon-B alone. Furthermore, the 
combination of a suboptimal dose of benztropine (2.5 mg per kg) with 
a suboptimal dose of FTY720 (0.1 mg per kg) resulted in a significant 
decrease in clinical severity (Extended Data Fig. 10e) comparable to the 
clinical efficacy observed when FTY720 was given alone at the reported 
therapeutically effective dose of 1 mg per kg (Extended Data Fig. 10f). 
Addition of benztropine to FTY720 does not result in a decrease in 
immune cell infiltration compared to FTY720 treatment alone (Extended 
Data Fig. 10g, h), and addition of FTY720 to benztropine does not result 
in an increase in the number of oligodendrocytes compared to benztro- 
pine treatment alone (Extended Data Fig. 10i, j). This finding indicates 
that the observed benefit (Extended Data Fig. 10k) that results from 
this drug combination is derived from an additive effect that occurs 
when both immunological and remyelination mechanisms are targeted 
simultaneously. This observation may prove clinically relevant, because 
FTY720 treatment is associated with a dose-dependent bradycardia, 
which might be reduced by combination therapy with benztropine, 
and benztropine is associated with dose-dependent adverse neurological 
side effects. 


Discussion 

We have identified a centrally acting drug that, when administered in 
a clinically relevant model of multiple sclerosis, significantly decreases 
disease severity by directly stimulating the differentiation of a progenitor 
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cell population leading to enhanced remyelination and functional 
recovery. Pharmacological data clearly indicates that the mechanism 
of action of benztropine is dependent on muscarinic receptor antago- 
nism, but not any of its other known biological activities. However, 
based on the lack of activity of some of the anti-muscarinics tested, we 
cannot rule out the possibility that an additional unidentified target 
may exist that is common to all of the active anti-muscarinics iden- 
tified. Further, although experimental evidence clearly indicates that 
benztropine induces OPC differentiation and remyelination in vitro 
and in vivo, it is possible that the therapeutic effect observed in the EAE 
model could also result in part from a protective effect derived from 
increased glial cell density. Inclusion of benztropine in EAE treatment 
regimens involving existing approved immunosuppressive drugs 
results in enhanced functional recovery and, in the case of FTY720, 
significantly decreases the dosages of both drugs that are required to 
achieve equivalent efficacy. Successful translation of these findings to 
multiple sclerosis patient populations will require further preclinical 
and clinical evaluation, as benztropine and other anti-muscarinics 
have significant dose-dependent neurological and psychiatric side 
effects. To our knowledge, these results provide the first in vivo evid- 
ence supporting the notion that benefit can be achieved by treating 
multiple-sclerosis-like symptoms using the combination of an immu- 
nosuppressive drug with a remyelination enhancer and may have a 
significant effect on the development of new and more effective ther- 
apies for the treatment of multiple sclerosis. Finally, we are evaluating 
other hits from our screen to identify compounds that induce re- 
myelination by mechanisms distinct from that of benztropine. 


METHODS SUMMARY 


Cell culture and high-throughput screening. Primary rat optic nerve-derived 
OPCs were screened in 384-well format (1.5 1M compound) using basal differ- 
entiation conditions. Following immunostaining using an MBP-specific antibody, 
plates were imaged and analysed using Opera (Perkin Elmer) or Cell Insight (Thermo) 
high content imaging systems. 

In vivo animal models. The PLP-induced EAE model was performed and scored 
using 8-week-old female SJL/J mice as described’’. Mice received daily i.p. injec- 
tions with vehicle or drug using either prophylactic or therapeutic regimens. For 
the cuprizone model, C57BL/6 mice were fed 0.2% w/w cuprizone with chow for 
7 weeks and thereafter with normal chow, at which point benztropine (10 mg per kg) 
or saline was injected daily (i.p.) starting for 5 weeks. Spinal cords were isolated and 
histological analysis was performed at weeks 0-5 following cuprizone withdrawal. 


Online Content Any additional Methods, Extended Data display items and Source 
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METHODS 


Statistical methods and experimental design. High-throughput screening was 
performed at a single concentration (1.5 1M) and primary hits were reconfirmed 
in triplicate. In all in vivo animal studies, we calculated that we required 8-12 mice 
per group in order to have 80% power of detecting approximately a 30% change, 
assuming a standard deviation of 30% at a significance level of « = 0.05. It has 
been our experience that this is sufficient to detect clinically and statistically signi- 
ficant differences with repetitive studies. As per standard scoring methods for 
EAE, mice found moribund or dead were scored as 5 for the remainder of the 
study. For therapeutic dosing, just before drug treatment, mice were randomized, 
based on clinical severity such that each treatment group had equivalent mean 
clinical EAE scores. Mice were scored non-blinded. For all experiments, assuming 
normal distribution, two-sided t-tests were used to evaluate comparisons between 
2 groups and ANOVA was used when >2 groups were compared. For the quanti- 
tative analysis of in vitro myelination, ANOVA with Bonferroni correction was 
used. Where possible, data are represented as mean + s.e.m., otherwise data are 
represented as mean + s.d., as indicated in the figure legends. 

Cell culture. Rat primary optic nerve OPCs were isolated by panning’””’ (>99% 
A2B5*) and cultured in poly-p-lysine (10 pg ml ')-coated flasks in OPC prolif- 
eration media (Neurobasal Media, Invitrogen) supplemented with B27-without 
vitamin A (Invitrogen), non-essential amino acids, L-glutamine and PDGF-AA 
(30 ng ml 1, Peprotech)) at 37 °C with 5% CO. OPCs were used between passage 
10 and 15 and were never allowed to reach confluence to maintain a naive state. 
For differentiation, OPCs were plated in differentiation media (Neurobasal Media 
(Invitrogen) supplemented with B27-without vitamin A (Invitrogen), non-essential 
amino acids, L-glutamine and PDGF-AA (2 ng ml}; Peprotech)). 

Mouse cortical neural cell mixture was prepared from E18.5 brain using the Papain 
dissociation system, according to the supplier’s instructions kit (Worthington 
Biochemical). Then, both oligodendrocytes and OPCs were depleted from the 
mixture by sequential immunopanning as described previously”’*’. The cells were 
then cultured on poly-p-lysine-coated 8-well chamber slide (Nunc) as described 
previously*’. Mouse neural stem cells (Invitrogen) were cultured in OPC medium 
as described previously™*. Induced mouse OPCs (5,000 cells per well) were cultured 
with benztropine on the cortex-derived culture for 7 days. Cells were immunola- 
belled as described previously”'. 

High-throughput screening and imaging. OPCs were plated at a density of 1,000 
cells per well on poly-p-lysine (10 1g ml") coated 384-well plates (Greiner) in 
differentiation media. Compounds were added at a final concentration of 1.5 uM 
within 12h after plating the cells and incubated at 37 °C with 5% CO) for 6 days. 
Cells were then fixed for 20 min with 4% formaldehyde solution and stained with 
mouse anti-rat MBP antibody (Millipore) in 3% BSA, 0.3% Triton X-100 with 
overnight incubation at 4 °C. The cells were washed and incubated with secondary 
antibody (goat anti-mouse Alexa Fluor488) and DAPI (Invitrogen) for 1 hat room 
temperature. The cells were washed and plates were sealed and imaged using an 
Opera confocal imaging reader (Perkin Elmer) or a Cellomics Cell Insight imaging 
reader (Thermo). An air X10 lens was used to capture 9 images per well at both 
wavelengths (488 and 365 nm), representing different locations ina single well. For 
image analysis, DAPI-stained nuclei and MBP-positive cell bodies were detected 
using an algorithm that selects for positive cell bodies and nuclei within a range of 
fluorescent emission values and sizes as determined by fitting parameters to positive 
(thyroid hormone (T3), 1 4M) and negative controls (DMSO, 0.1%). Numerical 
results from the analysed images were later exported for analysis in Excel 
(Microsoft) and Spotfire (Tibco). 

Data analysis. EC; values were obtained by fitting the data using the sigmoidal 
dose-response curve-fitting function (Prism; GraphPad Software). Eight or twelve 
different concentrations and three data points per concentration were usually used 
for curve fitting. Experiments were repeated 2 or more times. 

Compounds. Hit compounds were purchased as powders and dissolved in DMSO 
for in vitro studies or saline for in vivo studies. Benztropine mesylate (MP Bio- 
medicals), mycophenolate mofetil (Selleck Chemicals), FTY720 (Cayman Chemical), 
carbachol (EMD Chemicals), haloperidol (Sigma), quinpirole (Sigma), interferon-B, 
ipratropium (Sigma), oxybutynin (Sigma), histamine (Acros Organics), scopol- 
amine (Sigma), atropine (Sigma), histamine trifluoromethyl toluidide (HTMT; 
Tocris Bioscience). 

In vitro myelination. Mouse embryonic stem cells (ES cells) were differentiated 
into neurons using a previously published protocol’®. Neurons were allowed to 
mature and extend axons for about one week before plating rat primary OPCs. 
Cells were treated with either DMSO or 1 1M T3 or 1M benztropine for two 
weeks and myelination under different conditions was quantified. For quantifica- 
tion, co-cultures were fixed for 15 min in 4% PFA and stained with TUJ1 (rabbit; 
Covance) and MBP (mouse; Millipore) antibodies. A total of 10 randomly selected 
regions on 2 plates per experiment were imaged. Images were imported to Imaris 
(Bitplane), and axon, oligodendrocyte processes were identified. Rendered channels 
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of the processes were exported as TIFF files to be analysed further using a custom 
script in MATLAB (Mathworks). Myelination was identified as regions of co- 
localization between MBP positive oligodendrocyte processes and TUJ1 positive 
neurites. Cell bodies were eliminated from the analysis to reduce the error due to 
non-specific overlap of MBP and TUJ1 staining. 

Immunocytochemistry and antibodies. Immunostaining of cells was performed 
using standard protocols. Primary antibodies and dilutions are provided in Sup- 
plementary Table 4. Secondary antibodies were Alexa Fluor488 or Alexa Fluor647 
conjugated anti-rabbit or anti-mouse (1:500; Invitrogen). Nuclei were counter- 
stained with DAPI. Cells were imaged using a Nikon confocal microscope with a 
X10 air lens. 

Western blot analysis. OPCs were plated in basal differentiation medium and 
treated for 6 days with benztropine (1.5 4M), T3 (1 4M) or DMSO. Cells were 
collected by scraping, pelleted and washed with ice-cold phosphate buffered 
saline (PBS) before lysis with PBS containing protease inhibitors (Roche), phos- 
phatase inhibitors (Sigma), Triton X-100 and EDTA. Following incubation on ice 
for 20 min, lysed cells were centrifuged (16,000g, 15 min at 4 °C). Total protein was 
quantified using a NanoDrop and equal amounts of protein from each sample 
was denatured by boiling with 4x SDS sample buffer (Invitrogen) containing 
B-mercaptoethanol (5%). Proteins were electrophoresed using 4-20% SDS- 
PAGE gels (BioRad) and transferred on to a nitrocellulose membrane (BioRad). 
The membrane was blocked with 5% non-fat dry milk in TRIS buffered saline with 
Tween-20 (0.2%) and reacted with appropriate antibodies. Blots were incubated 
with HRP-conjugated secondary antibodies and visualized using film and Super 
Signal West peroxide solution (Pierce). 

Semi-quantitative RT-PCR analysis. Total RNA from different samples was 
isolated and purified using the RNeasy mini kit (Qiagen) with on-column DNase 
digestion according to the manufacturer's protocol. Single-stranded cDNA was 
synthesized from 3 ug of total RNA with the SuperScript III First-Strand Synthesis 
System for RT-PCR using oligo(dT)2o primers (Invitrogen). PCRs were performed 
using the Phusion High Fidelity polymerase and gene-specific primers (Supplemen- 
tary Table 5). Cycle parameters for all genes were 30 s at 94 °C, 60s at 57 “Cand 60s 
at 72 °C for 25 cycles. 

Quantitative RT-PCR. For expression analysis of MBP, MOG, cyclin D1, cyclin 
D2, c-Fos and c-Jun, RNA was isolated and reverse-transcribed to complementary 
DNA (cDNA) as previously described. cDNA was used for qRT-PCR with gene- 
specific Taqman probes labelled with FAM (Applied Biosystems). qRT-PCR was 
performed using the ABI 7900HT instrument (Applied Biosystems) with stand- 
ard parameters. The amount of cDNA was optimized so that amplification of 
both control genes and the cDNAs of interest were in the exponential phase. Trans- 
cripts were quantitated by comparative C, method and normalized to endogenous 
controls, B-actin and GAPDH. 

For expression analysis of other genes, RNA was isolated and reverse-transcribed 
to cDNA as previously described. cDNA was used for SBYR-green based qRT- 
PCR using gene-specific primers (Supplementary Table 5). Transcripts were quan- 
titated by comparative C, method and normalized to endogenous controls, 18S, 
B-actin and GAPDH. 

Cyclic AMP HTRE assay. OPCs were plated at a density of 8,000 cells per well 
onto a 384-well plate and incubated at 37 °C overnight. The assay was performed 
using the HRTF cAMP dynamic 2 kit (CisBio Bioassays) per the manufacturer’s 
protocol. IBMX (1 |M final concentration) was added as a cAMP stabilizer. Data 
acquisition was performed in the time-resolved fluorescence resonance energy 
transfer (FRET) mode on Envision (PerkinElmer). The ratio between the acceptor 
fluorescence signal (Aggsnm) and donor fluorescence signal (A¢20 nm) X 10°, 
representing the FRET between the conjugated cAMP and the anti-cAMP anti- 
body, was calculated and plotted on the y axis. The ratio is inversely proportional 
to the endogenous cAMP levels in the sample. 

Calcium influx assay. Effect of compound treatment on the release of Ca** ions 
from the ER was measured using the FLIPR Tetra system (Molecular Devices). 
Briefly, cells were treated with carbachol as an M1/M3 agonist and the compounds 
were assayed for M1/M3 receptor antagonism using a Ca’ sensitive Fluo-3AM dye. 
EAE model. SJL/J (8-week-old female) mice were purchased from The Jackson 
Laboratory. Mice were immunized subcutaneously with murine proteolipid pep- 
tide (PLP 139-151; Peptides International) mixed 1:1 with supplemented complete 
Freund’s adjuvant (CFA, Fisher) followed by Bordetella pertussis toxin (200 pg 
per mouse, Sigma) on day 0 and day 2 as described*’. Clinical EAE was graded on 
a scale of 1-5 by established standard criteria as follows: score 0, no observable 
disease; score 1, limp tail; score 2, limp tail and partial limb weakness; score 3, one 
hind limb paralyzed; score 4, both hind limbs paralyzed; score 5, moribund/dead. 
Mice received daily i.p. injections with compounds dissolved in saline. Dosing 
was commenced either on the day of PLP injections defined as prophylactic 
regimen or at the first appearance of EAE symptoms defined as therapeutic 
regimen and continued until day 30. 
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Immunohistochemistry and antibodies. Spinal cords were isolated from mice 
using standard protocols, fixed in Formalin-Zinc overnight followed by incuba- 
tion in 30% sucrose at 4°C overnight. Spinal cords were cross-sectioned into 8 
pieces and embedded in Tissue-Tek OCT (Electron Microscopy Sciences). Thin 
sections were cut using a Leica cryostat and stained for myelin and infiltrating 
cells using Luxol fast blue, PAS, haematoxylin and eosin using standard protocols. 
For OPC and oligodendrocyte staining, sections were treated with Cirtisolve (Fisher) 
antigen retrieval solution, followed by incubation with primary antibodies (Sup- 
plementary Table 3). Secondary antibodies were Alexa Fluor488 or Alexa Fluor647- 
conjugated anti-rabbit or anti-mouse (1:500; Invitrogen). Nuclei were counterstained 
with DAPI. Cells were imaged using a Zeiss confocal microscope. Quantitative image 
analysis was performed using ImageJ (NIH) and ImagePro (Media Cybernetics). 
Numerical results from the analysed images were exported for analysis in Excel 
(Microsoft). 

Brains from cuprizone-treated mice were isolated using standard protocols, 
fixed in formalin-zinc overnight, and embedded in paraffin. Sections were depar- 
affinized using a standard protocol and stained with Luxol fast blue (LFB) and 
haematoxylin and eosin as previously described. Slides were scanned using a Leica 
scanner. Images from at least 6 sections were collected and quantitative image 
analysis was performed using ImageJ (National Institutes of Health) and 
ImagePro (Media Cybernetics). Numerical results from the analysed images were 
exported for analysis in Excel (Microsoft). 

Electron microscopy. Mice were exsanguinated with 0.9% saline followed by 
perfusion fixation with 4% paraformaldehyde, 1.5% glutaraldehyde and 1 mM 
CaCl, in 0.1 M cacodylate buffer and the spinal cords were exposed and fixed in 
situ overnight. Following complete removal, immersion fixation continued over- 
night at 4°C in cacodylate buffered 2.5% glutaraldehyde with 1mM CaCl, and 
then sliced into individual blocks. The tissues were then buffer washed, fixed in 
1% osmium tetroxide and subsequently dehydrated in graded ethanol series, 
transitioned in propylene oxide and embedded in EMbed 812 / Araldite (Electron 
Microscopy Sciences). Thick sections (1.5 um) were cut, mounted on glass slides 
and stained in toluidine blue for general assessment in the light microscope. 
Subsequently, 70-nm thin sections were cut with a diamond knife (Diatome), 
mounted on copper slot grids coated with parlodion and stained with uranyl 
acetate and lead citrate for examination on a Philips CM100 electron microscope 
(FEI) at 80kV. Images were documented using a Megaview III CCD camera 
(Olympus Soft Imaging Solutions). Images were analysed in Image-Pro for g-ratio 
measurements by manually drawing lines across 2 perpendicular diameters each 
for axons and axons plus myelin. Lengths of the lines (in pixels) as generated by 
Image-Pro were averaged across the 2 perpendicular measurements and converted 
to micrometres (jum) using the image scale bars. g-ratio is defined as the ratio of the 
diameter of a given axon and the diameter of the axon plus myelin unit. Approximately 
1,000 axons and axon plus myelin units were measured for each treatment group. 
In vitro T-cell assay. Spleens were isolated from SJL/J, teased apart to single cell 
suspensions, red blood cells were lysed and remaining cells were isolated by 
centrifugation, resuspended in complete media and counted. The cells were then 
plated with soluble anti-CD28 (5 jg ml’) at 2 X 10°cells per well on plates pre- 
coated with anti-CD3 (10 pg ml” '). Compounds dissolved in dimethylsulphox- 
ide (DMSO) (<0.1%) were added at 4 different dilutions (5 mM, 500 1M, 50 uM, 
5 uM) and incubated at 37 °C, 5% CO, for 24, 48, 72 and 96 h. Cells were analysed 
by flow cytometry for CFSE dilution and CD25 and CD69 expression. Cells were 
labelled with CFSE (3 1M) in PBS supplemented with bovine serum albumin 
(BSA, 0.1%). Labelling was performed in the dark at 37°C for 10min with 
occasional mixing. Labelling was stopped by addition of 5 volumes of ice-cold 
PBS (0.1% BSA). Labelled cells were cultured for two days and analysed on a LSR 
II flow cytometer (Becton Dickinson). The data was analysed using the FloJo 
software (Tree Star). 

In vivo T-cell assay. SJL/J (8-week-old female) mice were injected with PLP and 
Pertussis toxin to induce EAE as described in the previous section. On the same 
day, a prophylactic dosing regimen of compounds dissolved in saline was com- 
menced by i.p. injections. The mice were scored and dosed daily. Mice usually 
developed symptoms of EAE by day 9, with the peak of symptoms appearing 
around day 14. Mice were sacrificed on day 14 and spleens were removed. 
Blood was collected for sera. Spleens were teased apart to single cell suspensions, 
red blood cells were lysed and splenocytes were isolated by centrifugation. Cells 
were resuspended in complete medium defined as Dulbecco’s modified eagle 
medium (Gibco) supplemented with 10% fetal bovine serum (Gibco) and 
counted. Cells were then plated at 2 X 10° cells per well on a 96-well plate and 
treated with either 20pgml~' PLP peptide or 10ngml | phorbol myristate 
acetate (PMA) and 300ngml ' ionomycin for 48h to stimulate cytokine pro- 
duction. Cytokine secretion was blocked during the last 5h by treatment with 
monensin. Supernatants were collected for ELISAs of IL-2, IFN-y, IL-10 and 
TNF-a. Cells were stained for flow cytometry using antibodies against CD4, 


CD8, B220 and CD44. Cells were also stained for intracellular cytokines such 
as IL-2, IFN-y, IL-10 and TNF-a along with anti-CD4. Flow cytometry analysis 
was performed on a LSR II flow cytometer (Becton Dickinson) and the data was 
analysed using the FloJo software (Tree Star). 

In vivo assays for T-cell-dependent B-cell responses. SJL/J mice were injected 
ip. with TNP-KLH (Biosearch Technology; 25 jig per mouse in complete Freund’s 
adjuvant). Benztropine (10 mg per kg) or vehicle was injected i.p. daily for 21 days. 
Sera were collected on day 0, 7, 14 and 21. The collected sera was analysed by ELISA 
for anti-TNP IgG levels. 

Bone marrow-derived macrophages and M1/M2 polarization. Bone marrow- 
derived macrophage (BMM) cultures were generated from adult SJL/J mice. Briefly, 
femurs and tibias were collected bilaterally and marrow cores were flushed using 
syringes filled with DMEM medium supplemented by with 1% penicillin/streptomycin, 
1% HEPES, 1% sodium pyruvate, 0.1% f-mercaptoethanol and 10% FBS (com- 
plete DMEM) (Gibco). Cells suspension were filtered through a 0.2 um cell 
strainer, counted, plated in complete DMEM supplemented with 5% of horse 
serum and 20% supernatant from macrophage colony stimulating factor secreting 
L929 (sL929) cells. The sL929 drives bone marrow cells towards a macrophage 
phenotype (10 days). Non-adherent cells were removed at day 4. At collection 
(day 10), 90 + 0.7% of cells were macrophages (assessed by F4/80 immunostain- 
ing). To promote differentiation into M1 or M2 macrophages, cells were treated 
with LPS (100 ng ml a. Sigma-Aldrich) plus IFNy (20 ng ml © \, Peprotech) or IL-4 
plus IL-13 (20ngml~’; Peprotech), respectively, for 16h, in the presence or 
absence of benztropine (5 [1M). After M1 or M2 polarization, some cells (cultured 
without benztropine) were saved and treated once again, as described above. The 
viability of M1/M2 macrophages was analysed in every experiment by flow cyto- 
metry of DAPI-stained cells (Invitrogen). Supernatants and cells were subse- 
quently collected for cytokine (TNF-«) and phenotypic analysis, respectively. 
M1 and M2 polarization was assessed using the following fluorochrome-labelled 
antibodies: anti-F4/80, rat anti-mouse mannose receptor (CD206), rat anti-mouse 
CD86, rat anti-mouse CD80 and anti-mouse I-A/I-E antibodies (BioLegend). 
Isolation of spleen and leukocyte isolation from spinal cords. EAE mice were 
euthanized, spleens were isolated from the mice as per standard protocols and 
crushed by mechanical disruption using a tissue homogenizer. Spinal cords were 
isolated from the mice, treated with collagenase and ground to a single cell 
suspension. Suspensions from 2 spinal cords from mice with similar clinical 
severity scores were combined and centrifuged through gradients of 30% and 
70% Percoll to isolate infiltrating leukocytes. Total RNA was isolated and used for 
gene expression analysis as previously described. 

Adoptive transfer EAE. SJL/J donor mice were purchased from The Jackson 
Laboratories. Mice were immunized subcutaneously with murine proteolipid 
peptide (PLP39-151; Peptides International) mixed 1:1 with supplemented com- 
plete Freund’s adjuvant (CFA, Fisher). Mice were then either injected with saline 
or benztropine (10 mg per kg, daily, ip.) for 7 days, and re-immunized with 
murine PLP emulsified 1:1 with incomplete Freund’s adjuvant. Daily benztropine 
or vehicle injections were continued until day 10, when spleens were isolated, teased 
apart to single cell suspensions, red blood cells lysed and splenocytes isolated by 
centrifugation. Cells were resuspended in complete media (RPMI 1640 supple- 
mented with 10% fetal bovine serum (FBS), 1.25% HEPES buffer, 1% sodium 
pyruvate, 1% penicillin-strepyomycin, 1% glutamine, 1% non-essential amino acids, 
0.01% 2-mercaptoethanol (2-ME)) (Sigma-Aldrich) and counted. Splenocytes 
were analysed by flow cytometry to determine cell viability and percentage of 
CD4* T cells as described earlier. Splenocytes from vehicle-treated mice were 
further cultured in vitro in the presence of PLP 39.151 (30 ug ml’), interleukin 12 
(IL-12) (LO ng ml ~ ') and DMSO (<0.1%), whereas splenocytes from benztropine- 
treated donor mice were further cultured in vitro in the presence of PLP,39-151, 
interlukin 12 (IL-12) (10 ng ml — "and benztropine (5M) for 72 hat 37 °C. At the 
end of 72h, cells were analysed by flow cytometry to determine cell viability and 
percentage of CD4* T cells as described earlier (~80% activated T cells after 
in vitro culture). Cells were then washed and resuspended in PBS (50 million 
cellsml') and 200 ul of cell suspension was injected into 4 groups (according 
to the following table) of naive recipient SJL/J mice by intravenous injection 
followed by Bordetella pertussis toxin (i.p., 200 ug per mouse, Sigma) on day0 
and day 2. Clinical EAE was graded on a scale of 1-5 by established standard 
criteria as follows: score 0, no observable disease; score 1, limp tail; score 2, limp 
tail and partial limb weakness; score 3, one hind limb paralyzed; score 4, both hind 
limbs paralyzed; score 5, moribund/dead. Mice received daily ip. injections of 
either saline or benztropine (10 mg per kg) according to Extended Data Fig. 8c, d. 
Cuprizone model. C57BL/6 mice (8-week-old females) were purchased from the 
breeding colony at The Scripps Research Institute. Mice were fed 0.2% w/w 
cuprizone (bis-cyclohexanone oxaldihydrazone, Sigma-Aldrich) mixed into a 
ground standard rodent chow (Harlan). Cuprizone diet was maintained for 7 weeks; 
thereafter mice were put on a normal chow for another 5 weeks. Compounds were 
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dissolved in saline and daily i-p. injections were initiated at the withdrawal of the 
cuprizone diet. At different time points (0, 1, 2, 3, 4 and 5 weeks after cuprizone 
withdrawal), animals were euthanized. Brains were extracted, fixed in formalin- 
zinc, paraffin-embedded, sectioned and stained as described in the immunohisto- 
chemistry section. 

Image analysis of brain sections. Using Image] (NIH), images were converted to 
a 256-shade grey scale. The 256 shades of grey were then divided into 5 bins of 50 
shades each: 0-50, 50-100, 100-150, 150-200 and 200-256, with 0 being the pixel 
with darkest shade of grey and 256 being the pixel with the lightest shade of grey. 
Each bin was assigned an arbitrary colour: 0-50 (red), 50-100 (yellow), 100-150 
(green), 150-200 (light blue) and 200-256 (dark blue). Based on the intensity of 
staining, each pixel was classified into one of the 5 bins using Image-Pro plus 
software. The number of objects in the corpus callosum region classified into each 
bin was counted. Data was exported to Excel (Microsoft) for analysis. 
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Animal use statement:. All experiments were performed in accordance with 
approved Institutional Animal Care and Use Committee (IACUC) protocols of 
The Scripps Research Institute or Hokkaido University Institute for Genetic 
Medicine. 
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Extended Data Figure 1 | High-throughput screen to identify inducers of 
OPC differentiation. a, Rat primary OPCs in basal differentiation media 
treated with DMSO (<0.1%) or thyroid hormone (T3; 1 1M) for 6 days in 
culture, fixed and stained using antibodies for myelin basic protein (MBP), 
2',3'-cyclic-nucleotide 3’-phosphodiesterase (CNP) and oligodendrocyte 
marker 04. A2B5* OPCs differentiated into immature oligodendrocytes that 
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express CNP and O4, but not MBP, upon reduction of PDGF-AA. T3 added as 
a positive control induced differentiation to mature cells that express MBP. 
Scale bars, 100 um. b, Schematic representation of the high-throughput 
screening platform used to identify inducers of OPC differentiation. c, Inducers 
of OPC differentiation identified as hits from a screen of known biologically 
active compounds. Scale bars, 100 jim; inset, 40 um. 
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Extended Data Figure 2 | Benztropine induces dose-dependent OPC 
differentiation in vitro to mature oligodendrocytes. a, Dose response assay 
used to confirm primary screening activity of benztropine and determine 
potency (ECs). OPCs were treated with benztropine and immunostained using 
antibodies for MBP (n = 3, mean and s.d.). b, Images showing dose-dependent 
induction of OPC differentiation after treatment with benztropine. (Scale bars, 
100 tum; inset, 40 um). OPCs in basal differentiation media treated with DMSO 
(<0.1%), T3 (1 UM) or benztropine (1.5 |1M) for 6 days and analysed for MBP 
and MOG expression by western blot (c) and by qRT-PCR (d) (n = 3, mean 
and s.d.). e, OPCs were plated in differentiation medium and treated with 
DMSO (<0.1%), benztropine (1.5 1M) or T3 (1 uM) for 6 days. Cells were fixed 
and immunostained for myelin basic protein (MBP), myelin oligodendroglial 


glycoprotein (MOG), CNP, oligodendrocyte marker O1, oligodendrocyte 
marker 04, glial marker SOX10, proteolipid peptide (PLP), OLIG1 and OLIG2. 
Representative images showing expression of mature oligodendrocyte markers 
in benztropine- and T3-treated cells, but not DMSO-treated cells. Scale bars, 
100 xm; inset, 40 Lm. f, Expression of cell cycle genes by qRT-PCR. (n = 3, 
mean and s.d., *P < 0.05, t-test). g, OPCs plated in basal differentiation 
medium and treated with benztropine (1.5 1M) on various days (0, 1, 2,3, 4and 
5), fixed on day 6 and immunostained for MBP (n = 3, mean and s.d.). h, OPCs 
plated in basal differentiation medium, treated with benztropine (1.5 4M) on 
the same day, fixed on various days (3, 4, 5 and 6) following compound 
treatment and immunostained for MBP (n = 3, mean and s.d.). 
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Extended Data Figure 3 | Benztropine induces OPC differentiation and in 
vitro myelination through M1/M3 muscarinic receptor antagonism and has 
no effect on histamine or nicotinic signalling. a, Mouse OPCs co-cultured 
with mouse cortex-derived cells in the presence of DMSO or benztropine and 
immunostained for MBP (red) and nuclei with Hoechst 33342 (blue). Scale 
bars, 100 Lm. b, Quantification of MBP staining of mouse OPCs treated with 
DMSO or benztropine. c, Analysis of myelination in OPCs with neurons 
co-culture. Arrowheads point to regions of myelination. Scale bars, 20 jim. 

d, Quantification of fraction of myelinating oligodendrocytes in OPCs with 
neurons co-cultures (n = 10, mean and s.e.m., **P < 0.01, ANOVA with 
Bonferroni correction). OPCs co-treated with benztropine (1.5 11M) and 
carbachol (2.3 uM) for 6 days and stained for MBP (green) (Scale bars, 100 jim; 
inset, 40 jum). e, Antagonism of benztropine-induced OPC differentiation by 
muscarinic agonist carbachol. f, Quantification of MBP staining of OPCs co- 
treated with benztropine (1.5 uM) and muscarinic receptor agonist carbachol 
for 6 days under basal differentiation conditions (m = 3, mean and s.d.). 

g-k, OPCs plated co-treated with benztropine (0.8 1M) and either nicotine 
(g), histamine (h), histamine receptor agonist histamine trifluoromethyl 


toluidide (HTMT) (i), dopamine receptor agonist quinpirole (j) or dopamine 
receptor antagonist haloperidol (k) (n = 3, mean and s.d., ns = not significant). 
1, Various nicotinic receptor antagonists have no effect on OPC differentiation. 
m, Benztropine blocks carbachol- and muscarine-induced activation of Notch 
signalling measured by western blot for intracellular domain of Notch1. 

(a.u., arbitrary unit, n = 3, mean and s.d., *P < 0.05, t-test). n, Naive whole rat 
brain and rat primary OPCs treated with DMSO (<0.1%) or T3 (1 1M) for 

6 days tested for expression of muscarinic receptors and choline acetyl 
transferase (ChAT) by PCR using gene-specific primers. 0, Quantification of 
M1, M2, M3, M4 and ChAT expression by qRT-PCR. (# = 3, mean and s.d., 
expression fold change normalized to OPCs). p, OPCs treated with benztropine 
(25 uM) and pelleted for western blot analysis of total protein. q, Carbachol 
induced a dose-dependent increase in intracellular Ca** levels, whereas 
benztropine and atropine (a muscarinic antagonist) dose-dependently blocked 
carbachol (50 1M) induced calcium influx through antagonism of M1/M3 
muscarinic receptors. r, Benztropine (13 1.M) had no effect on the levels of 
cAMP. Forskolin is a positive control for increasing intracellular cAMP 

(n = 3, mean and s.d., *P < 0.05, t-test). 
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Extended Data Figure 4 | Benztropine dose-dependently reduces clinical 
severity and induces remyelination in the PLP-induced EAE model. 

a, Clinical severity scores of EAE mice treated with various doses of benztropine 
in the prophylactic mode (n = 8, mean and s.e.m.). b, EAE mice treated with 
benztropine (10 mg per kg) or vehicle in the therapeutic mode and spinal cord 
sections from mice representative of the average group scores during the relapse 
phase of EAE stained with Luxol fast blue and H&E, or Luxol fast blue only. 
Arrows point to regions of lymphocyte infiltration (LFB + H&E) or 
demyelination (LFB). Scale bars represent 100 um. EAE mice treated with 
benztropine (10 mg per kg) or vehicle in prophylactic mode. ¢, Spinal cord 
sections from mice representative of the average group scores on day 8, 11 and 
14 immunostained with antibodies specific to CD45 and GST. d, Mean 
clinical scores of mice at the time of spinal cord isolation and quantification of 
the infiltrated areas (CD45~) and number of GSTx™ cells (n = 8, mean and 


s.e.m., **P < 0.01, t-test). Scale bars, 100 um. e, EAE mice treated with 
benztropine (10 mg per kg) or vehicle in prophylactic mode and spinal cord 
sections from mice representative of the average group scores on day 11 and 14, 
immunostained with antibody specific to MBP. Arrows point to regions of 
lymphocyte infiltration. Scale bars, 100 pm. f, Electron microscopy images 
showing myelin around axons in normal mice, vehicle-treated mice and mice in 
remission. Scale bars as indicated. g, Analysis of electron microscopy images 
indicating distribution of axonal diameters measured for 4 groups. h, Analysis 
of electron microscopy images indicating distribution of g-ratios of axons for 4 
groups. i, Scatterplot of g-ratios in relation to spinal cord axonal diameters 

(n = 1,000, ***P < 0.001, one-way ANOVA, exponential trend line). 

j, Quantification of the number of axons associated with oligodendrocytes 

(n = 25, mean and s.e.m., **P < 0.01, t-test). Oligodendrocytes were identified 
visually by their cytoplasmic processes wrapping around axons. 


©2013 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


0 10 ©6108 0 0 102 = 103 aaly 10° oO 1 ie. 1ot 16 
CFSE——> —pD - —_> .. CFSE — > 
H DMSO H DMSO ESE ®@ Unstimulated ® Unstimulated 
a eciuanes ee mOMSO m DMSO 
® atropine @ Clemastine  Benztropine Hi Mycophenolate 
a Scopolamine a Ipratropium 
a Propiverine @ Oxybutinin 
a Benztropine 
e 
b Unstimulated DMSO Mycophenolate Benztropine Me 
15 
2125 : 
1.32 | \ : 506 293 | 50.1 ca Oo Vehicle 
f /\ A 5 10 BB Benztropine 
ial \ye : e 
\ LA i a LU LJ = 75 
CD4*CD69+ 8 
5.0 
| | 25 
——o j 
| 187 / 46 249 a 463 0 a RS cS a g x 3 oS 
G S & sy x oy 
/ wh f J SPX Fe oh Fw SS LS 
4 rN wad en be be we 5®) 
= : NZ ‘3 ; Om” SY 2 te) 
R So ee x GY 6G 
CD4*CD25+ 1e) ie) 
fis 
c @ Unstimulated -@ Unstimulated 
807 #@ DMSO 80 4 DMSO 12.5 
® Benztropine @ Benztropine Oo Vehicle 
Seal * Clemastine S60 *Clemastine = 10 BB Benztropine 
- © scopolamined  Scopolamine & 
a |e a seeped A a 275 P<0.05 
8404 Quinuclidine B40 Quinuclidine 2 
<< = Ipratropium 3 ~ Ipratropium & 5.0 
5 204 seis Atropine a 20 a Atropine 
i. Oxybutinin = Oxybutinin 25 
+ 
® Mycophenolate 0 Mycophenolate 
Control 25 10 5 2 04 01% Control 25 10 5 2 04 01 


Concentration (uM) FTY720 Concentration (uM) FTY720 SK SP oh we SF al «F Ff 
@ Unstimulated i SF S & oo © 4 xv Y ‘a 
@ Unstimulated we x S fou oo Q ey 
any 4 DMSO ie 4 DMSO R eo " 
® Benztropine ® Benztropine g 
eo 1 * Clemastine al = © Clemastine ‘* Vehicle 
s F 
F + scopolamine -. +> scopolamine 75007 wk Benztropine 
O 404 * auinuctiaine S| 
Q40 aoe ice} ® quinuclidine = 
i i punepee 3104 * Ipratropium — 5000! 
6204 Atropine Q > Atropine = 
 oxyounnit ©204 * Oxybutinin 2 
0 Mycophenolate ny © 
Control 25 10 5 2 O04 01% ere: a me id 
‘ FTY720 Control 25.10 5 2 04 O14 = 
Concentration (uM) Concentration (uM) PiNvgeo 
0 
0 5 fo i 28 2 
Days 


Extended Data Figure 5 | Benztropine has no effect on in vitro and in vivo 
immunological responses in EAE mice. a, Benztropine and various 
muscarinic antagonists have no effect on in vitro T-cell proliferation measured 
using carboxyfluorescein succinimidyl ester (CFSE) labelling, whereas 
mycophenolate and FTY720 suppress T-cell proliferation as determined by the 
percentage of CD4~ T-cell-gated populations positive for the given marker. 
b, c, Various muscarinic antagonists have no effect on T-cell activation as 
measured by CD4*CD25*, CD4*CD69*, CD8* CD25* and CD8*CD69* 
cell populations. FTY720 and mycophenolate serve as positive controls for 
suppression of T-cell activation. d, Representative flow cytometry scatter plots 
show similar numbers of CD4*, CD8*, and CD44" cells in spleens isolated 
from vehicle- and benztropine-treated mice. e, f, Total splenocytes isolated 
from benztropine (10 mg per kg) or vehicle treated (14 days in the prophylactic 


mode) naive SJL/J (e) or EAE (f) mice analysed for various populations of 
immune cells and cytokine secretion. Benztropine treatment had no effect on 
the numbers of total splenocytes, CD4* T cells, CD8* T cells, CD4* CD44" T 
cells and CD8* CD44" T cells. Benztropine treatment showed a minor, but 
significant decrease in the number of B cells (n = 5, mean and s.e.m., *P < 0.05, 
t-test). Benztropine had no effect on cytokine production from CD4* T cells 
expressing IL-2, IL-10, TNF-« or IFN-y. (n = 5, mean and s.e.m.). 

g, Benztropine showed no effect on keyhole limpet hemocyanin protein 
conjugated to 2,4,6-trinitrophenyl hapten (TNP-KLH)-induced T-cell- 
dependent B-cell response. Mice were injected with TNP-KLH in adjuvant and 
treated with vehicle or benztropine (10 mg per kg) daily. Serum was isolated at 
various time points and anti- TNP-IgG levels were measured by ELISA. 

(3 replicate ELISAs, n = 5 mice per group, mean and s.e.m.). 
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Extended Data Figure 6 | Benztropine does not affect derivation and in 
vitro polarization of macrophages from bone marrow derived monocytes. 
a, Flow cytometry analysis of bone marrow derived monocytes treated in vitro 
with either DMSO (<0.1% v/v) or benztropine (5 1M) for 24h followed 

by 24h treatment with LPS (100 ng ml!) plus IFNy (20 ng ml") for the 
expression of M1 markers: CD86, MHC-II and CD80, or 24h treatment with 
IL-4 plus IL-13 (20 ng ml * each) for the expression of M2 marker CD206. 
b, M1/M2 polarized macrophages re-stimulated using either LPS (100 ngml_') 
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plus IFNy (20 ng ml!) (M1) or IL-4 plus IL-13 (20 ng ml! each) (M2) for 16h 
in the presence of either benztropine (5 4M) or DMSO and analysed for the 
expression of M1 (CD80) or M2 (CD206) markers by flow cytometry. 

c, d, Treatment with LPS (100 ng ml!) plus IFNy (20 ng ml ') induced the 
expression of the prototypical M1 cytokine TNF-« as detected by intracellular 
flow cytometry (c) and ELISA (d) with no significant differences between 
DMSO or benztropine (5 [1M) treated cells (data representative of 2 replicate 
experiments). 
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Extended Data Figure 7 | Benztropine does not affect in vivo polarization 
of macrophages in the spleen or spinal cord. EAE mice were treated with 
benztropine (10 mg per kg) or vehicle for 14 days in the prophylactic mode. 
a, Mean clinical EAE scores for mice treated with vehicle or benztropine (n = 6, 
mean and s.e.m. for spleens and spinal cords, n = 12 for isolated spinal 
leukocytes analysis). b—e, Spleens and spinal leukocytes were isolated from the 
mice as described in Methods. Total RNA was isolated, reverse transcribed and 


CD206 


Mglt Mgl2 Mret Reinla CD11b 


gene expression was measured by qRT-PCR. Expression for each marker was 
normalized to the average gene expression of the vehicle group. No significant 
differences were observed in the expression of markers of macrophage 
polarization in the spleen (b), whole spinal cords (c) and leukocytes 

(d, e) isolated from spinal cords (n = 6 mice per group for spleens and spinal 


cords, n = 12 mice per group (n = 6 for RT-PCR) for spinal leukocytes 
analysis. Error bars represent s.e.m.). 


©2013 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


4 400 bs 
2 
2 80 =@® Vehicle --> Vehicle 4 =® Vehicle --> Vehicle 
5 = Vehicle --> Benztropine ih 2 —@ Vehicle --> Benztropine 
3 60 = Benztropine --> Vehicle = ®= Benztropine --> Vehicle 
£ =—* Benztropine --> Benztropine = —* Benztropine --> Benztropine 
< 2 
o 40 ‘. 
ro) 51 
a) f= 
o 20 S 
= 
) 0 wee 
O BIO” {sT. 20° “"25" 30 QO 510 15 *"*5g° 25 30 
Days Days 
c , 
Vehicle daily i.p. In vitro culture 


HHH, a 
Donor mice { => oO So [e) 
10 days os rou 
PLP + IL12 + BT (5 uM) 
BT daily i.p. Day 10: Spleens harvested 4 days 


SLTTEE RBCs lysed 
eeeeeee y 08 ® 


Immunized with PLP seececceerestpes 
10 days rou o 


PLP + IL12 + DMSO 


FACS analysis 
~) 
= e2 Lo 
Se° 
PLP + IL12 + BT (5 uM) 
Day 14 & 
ae Daily vehicle i.p. (BT > Vehicle) 
02 XP = 
—> “pee 


PLP + IL12 + DMSO 


Recipient mice Daily vehicle i.p. (Vehicle > Vehicle) 


Daily BT (10 mg per kg) i.p. (Vehicle > BT) 


Cells injected i.p. Daily BT (10 mg per kg) i.p. (BT > BT) 


(~50% CD4+ cells, 95% viability) in Pertussis Toxin i-p. e 140 F P 
d both groups . 
= 120 
£ a 
Group Number Donor Mice In vitro Recipient mice 2 4100 "2 
EE eee oO is - 
: 2 80 aa 
Vehicle DMSO Vehicle iG . a 
Se Eee $ wo. & 
Benztropi = << 
Vehicle DMSO enztropine 7 
oes te =o 
———————————— & 


Benztropine Benztropine 
Benztropine Benztropine 


Extended Data Figure 8 | Benztropine does not affect clinical severity inan — mice showed little to no clinical symptoms of EAE compared to vehicle-treated 
adoptive transfer model of EAE. a, b, Incidence of adoptive transfer of EAE _ recipient mice, whether injected with benztropine- or vehicle-treated donor 
(a) and mean clinical EAE scores (b) in mice injected with splenocytes isolated splenocytes (n = 6 mice, mean and s.e.m., *P < 0.05, t-test). c, Schematic for 
from benztropine- or vehicle-treated donor groups. T cells obtained fromeither _ the adoptive transfer EAE model. d, Table showing various groups and 
benztropine- (BT, 10 mg per kg) or vehicle-treated donor EAE mice and further treatments. e, ELISA for anti-PLP IgG shows equivalent PLP response in donor 
expanded in the presence or absence of benztropine (5 UM) were able to mice treated with either vehicle or benztropine (n = 30, mean and s.e.m). 
adoptively transfer EAE to naive recipient mice. Benztropine-treated recipient 
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cuprizone model. a, Luxol fast blue (LFB) and H&E staining was performed __ d, Representative images of Image-Pro rendering of the quantification of 
on sections from the corpus callosum region of brains isolated from mice objects in each bin. e, Quantification of Luxol fast blue staining on week 2 shows 
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exposure to cuprizone. b, Images were converted to a 256 shade grey scale. along with corresponding reduction in the number of lighter pixels (151-200). 
c, The 256 shades of grey were divided into 5 bins of 50 shades each (1-50, Six images per mouse were analysed and four mice per group were used at 


51-100, 105-150, 151-200 and 201-256). Number of objects in the corpus each time point (mean and s.d., *P < 0.05, t-test). Scale bars, 200 um. 
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Extended Data Figure 10 | Effect of the addition of benztropine to 
interferon-f and FTY720 treatments. a, b, EAE severity scores for mice 
treated with various doses of FTY720 (a) or interferon-B (b). c, Mice treated 
therapeutically with FTY720 (1 mg per kg) in combination with a sub-optimal 
dose of benztropine (BT, 2.5 mg per kg) show significantly decreased clinical 
severity compared to FTY720 (1 mg per kg) or benztropine (2.5 mg per kg) 
alone. d, EAE mice treated with interferon-B (IFN 10,000 U per mouse) in 
combination with benztropine (2.5 mg per kg) show significantly decreased 
clinical severity compared to interferon-8 (IFN; 10,000 U per mouse) or 
benztropine (2.5 mg per kg) alone. e, EAE mice treated with a tenfold lower 
dose of FTY720 (0.1 mg per kg) in combination with benztropine (2.5 mg per 
kg). f, EAE mice treated with a tenfold lower dose of FTY720 (0.1 mg per kg) in 
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combination with benztropine (2.5 mg per kg) show clinical severity 
comparable to optimal dose of FTY720 (1 mg per kg) (m = 8 mice per group, 
mean and s.e.m., *P < 0.05; t-test). g, i, Spinal cord sections from EAE mice 
treated with the indicated drug(s) for 14 days in the prophylactic mode and 
immunostained for CD45 (immune cells) and GSTr (oligodendrocytes) 
showing infiltration (g) and oligodendrocytes (i). h, j, Quantification of the 
number of CD45" (h) and GSTx* (j) cells showing a decrease in infiltrating 
cells with FTY720 treatment and an increase in oligodendrocytes numbers with 
benztropine treatment and synergy between benztropine (2.5 mg per kg) and 
FTY720 (0.1 mg per kg) (1 = 5, mean and s.e.m., ns, not significant). Scale bars, 
100 tum. k, Mean clinical EAE scores for mice at the time of spinal cord isolation 
(n = 8, mean and s.e.m). 
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Mutational landscape and significance 
across 12 major cancer types 


Cyriac Kandoth!*, Michael D. McLellan'*, Fabio Vandin?, Kai Ye’, Beifang Niu’, Charles Lu!, Mingchao Xiel, Qunyuan Zhang’, 
Joshua F. McMichael!, Matthew A. Wyczalkowski', Mark D. M. Leiserson*, Christopher A. Miller’, John S. Welch*”, 
Matthew J. Walter*®, Michael C. Wendl'*"°, Timothy J. Ley'**°, Richard K. Wilson'*°, Benjamin J. Raphael” & Li Ding**° 


The Cancer Genome Atlas (TCGA) has used the latest sequencing and analysis methods to identify somatic variants across 
thousands of tumours. Here we present data and analytical results for point mutations and small insertions/deletions 
from 3,281 tumours across 12 tumour types as part of the TCGA Pan-Cancer effort. We illustrate the distributions of 
mutation frequencies, types and contexts across tumour types, and establish their links to tissues of origin, environmental/ 
carcinogen influences, and DNA repair defects. Using the integrated data sets, we identified 127 significantly mutated genes 
from well-known (for example, mitogen-activated protein kinase, phosphatidylinositol-3-OH kinase, Wnt/f-catenin and 
receptor tyrosine kinase signalling pathways, and cell cycle control) and emerging (for example, histone, histone 
modification, splicing, metabolism and proteolysis) cellular processes in cancer. The average number of mutations in 
these significantly mutated genes varies across tumour types; most tumours have two to six, indicating that the number of 
driver mutations required during oncogenesis is relatively small. Mutations in transcriptional factors/regulators show 
tissue specificity, whereas histone modifiers are often mutated across several cancer types. Clinical association analysis 
identifies genes having a significant effect on survival, and investigations of mutations with respect to clonal/subclonal 
architecture delineate their temporal orders during tumorigenesis. Taken together, these results lay the groundwork for 


developing new diagnostics and individualizing cancer treatment. 


The advancement of DNA sequencing technologies now enables the 
processing of thousands of tumours of many types for systematic 
mutation discovery. This expansion of scope, coupled with appreciable 
progress in algorithms’, has led directly to characterization of signifi- 
cant functional mutations, genes and pathways* **. Cancer encompasses 
more than 100 related diseases’, making it crucial to understand the 
commonalities and differences among various types and subtypes. 
TCGA was founded to address these needs, and its large data sets 
are providing unprecedented opportunities for systematic, integrated 
analysis. 

We performed a systematic analysis of 3,281 tumours from 12 cancer 
types to investigate underlying mechanisms of cancer initiation and 
progression. We describe variable mutation frequencies and contexts 
and their associations with environmental factors and defects in DNA 
repair. We identify 127 significantly mutated genes (SMGs) from diverse 
signalling and enzymatic processes. The finding of a TP53-driven 
breast, head and neck, and ovarian cancer cluster with a dearth of other 
mutations in SMGs suggests common therapeutic strategies might be 
applied for these tumours. We determined interactions among muta- 
tions and correlated mutations in BAP1, FBXW7 and TP53 with det- 
rimental phenotypes across several cancer types. The subclonal structure 
and transcription status of underlying somatic mutations reveal the 
trajectory of tumour progression in patients with cancer. 


Standardization of mutation data 

Stringent filters (Methods) were applied to ensure high quality muta- 
tion calls for 12 cancer types: breast adenocarcinoma (BRCA), lung 
adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), 
uterine corpus endometrial carcinoma (UCEC), glioblastoma multiforme 


(GBM), head and neck squamous cell carcinoma (HNSC), colon and 
rectal carcinoma (COAD, READ), bladder urothelial carcinoma (BLCA), 
kidney renal clear cell carcinoma (KIRC), ovarian serous carcinoma 
(OV) and acute myeloid leukaemia (LAML; conventionally called 
AML) (Supplementary Table 1). A total of 617,354 somatic mutations, 
consisting of 398,750 missense, 145,488 silent, 36,443 nonsense, 9,778 
splice site, 7,693 non-coding RNA, 523 non-stop/readthrough, 15,141 
frameshift insertions/deletions (indels) and 3,538 inframe indels, were 
included for downstream analyses (Supplementary Table 2). 


Distinct mutation frequencies and sequence context 


Figure 1a shows that AML has the lowest median mutation frequency 
and LUSC the highest (0.28 and 8.15 mutations per megabase (Mb), 
respectively). Besides AML, all types average over 1 mutation per Mb, 
substantially higher than in paediatric tumours”. Clustering”? illus- 
trates that mutation frequencies for KIRC, BRCA, OV and AML are 
normally distributed within a single cluster, whereas other types have 
several clusters (for example, 5 and 6 clusters in UCEC and COAD/ 
READ, respectively) (Fig. 1a and Supplementary Table 3a, b). In UCEC, 
the largest patient cluster has a frequency of approximately 1.5 muta- 
tions per Mb, and the cluster with the highest frequency is more than 
150 times greater. Multiple clusters suggest that factors other than age 
contribute to development in these tumours’*'’. Indeed, there is a 
significant correlation between high mutation frequency and DNA 
repair pathway genes (for example, PRKDC, TP53 and MSH6) (Sup- 
plementary Table 3c). Notably, PRKDC mutations are associated with 
high frequency in BLCA, COAD/READ, LUAD and UCEC, whereas 
TP53 mutations are related with higher frequencies in AML, BLCA, 
BRCA, HNSC, LUAD, LUSC and UCEC (all P < 0.05). Mutations in 


1The Genome Institute, Washington University in St Louis, Missouri 63108, USA. ?Department of Computer Science, Brown University, Providence, Rhode Island 02912, USA. 3Department of Genetics, 
Washington University in St Louis, Missouri 63108, USA. *Department of Medicine, Washington University in St Louis, Missouri 63108, USA. °Siteman Cancer Center, Washington University in St Louis, 
Missouri 63108, USA. ®Department of Mathematics, Washington University in St Louis, Missouri 63108, USA. 
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Figure 1 | Mutation frequencies, spectra and contexts across 12 cancer 
types. a, Distribution of mutation frequencies across 12 cancer types. Dashed 
grey and solid white lines denote average across cancer types and median for 
each type, respectively. b, Mutation spectrum of six transition (Ti) and 
transversion (Tv) categories for each cancer type. ¢, Hierarchically clustered 
mutation context (defined by the proportion of A, T, C and G nucleotides 
within +2 bp of variant site) for six mutation categories. Cancer types 


correspond to colours in a. Colour denotes degree of correlation: yellow 
(r = 0.75) and red (r= 1). 


POLQand POLE associate with high frequencies in multiple cancer types; 
POLE association in UCEC is consistent with previous observations”. 

Comparison of spectra across the 12 types (Fig. 1b and Supplemen- 
tary Table 3d) reveals that LUSC and LUAD contain increased C>A 
transversions, a signature of cigarette smoke exposure’®. Sequence 
context analysis across 12 types revealed the largest difference being 
in C>T transitions and C>G transversions (Fig. 1c). The frequency 
of thymine 1-bp (base pair) upstream of C>G transversions is mark- 
edly higher in BLCA, BRCA and HNSC than in other cancer types 
(Extended Data Fig. 1). GBM, AML, COAD/READ and UCEC have 
similar contexts in that the proportions of guanine 1 base downstream 
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of C>T transitions are between 59% and 67%, substantially higher 
than the approximately 40% in other cancer types. Higher frequencies 
of transition mutations at CpG in gastrointestinal tumours, including 
colorectal, were previously reported”. We found three additional 
cancer types (GBM, AML and UCEC) clustered in the C>T mutation 
at CpG, consistent with previous findings of aberrant DNA methyla- 
tion in endometrial cancer” and glioblastoma”. BLCA has a unique 
signature for C>T transitions compared to the other types (enriched 
for TC) (Extended Data Fig. 1). 


Significantly mutated genes 
Genes under positive selection, either in individual or multiple tumour 
types, tend to display higher mutation frequencies above background. 
Our statistical analysis’, guided by expression data and curation (Methods), 
identified 127 such genes (SMGs; Supplementary Table 4). These SMGs 
are involved in a wide range of cellular processes, broadly classified 
into 20 categories (Fig. 2), including transcription factors/regulators, 
histone modifiers, genome integrity, receptor tyrosine kinase signal- 
ling, cell cycle, mitogen-activated protein kinases (MAPK) signalling, 
phosphatidylinositol-3-OH kinase (PI(3)K) signalling, Wnt/B-catenin 
signalling, histones, ubiquitin-mediated proteolysis, and splicing (Fig. 2). 
The identification of MAPK, PI(3)K and Wnt/B-catenin signalling path- 
ways is consistent with classical cancer studies. Notably, newer categories 
(for example, splicing, transcription regulators, metabolism, proteolysis 
and histones) emerge as exciting guides for the development of new 
therapeutic targets. Genes categorized as histone modifiers (Z = 0.57), 
PI(3)K signalling (Z = 1.03), and genome integrity (Z = 0.66) all relate 
to more than one cancer type, whereas transcription factor/regulator 
(Z = 0.40), TGF-B signalling (Z = 0.66), and Wnt/B-catenin signalling 
(Z = 0.55) genes tend to associate with single types (Methods). 
Notably, 3,053 out of 3,281 total samples (93%) across the Pan- 
Cancer collection had at least one non-synonymous mutation in at 
least one SMG. The average number of point mutations and small 
indels in these genes varies across tumour types, with the highest (~6 
mutations per tumour) in UCEC, LUAD and LUSC, and the lowest 
(~2 mutations per tumour) in AML, BRCA, KIRC and OV. This 
suggests that the numbers of both cancer-related genes (only 127 
identified in this study) and cooperating driver mutations required 
during oncogenesis are small (most cases only had 2-6) (Fig. 3), 
although large-scale structural rearrangements were not included in 
this analysis. 


Common mutations 

The most frequently mutated gene in the Pan-Cancer cohort is TP53 
(42% of samples). Its mutations predominate in serous ovarian (95%) 
and serous endometrial carcinomas (89%) (Fig. 2). TP53 mutations 
are also associated with basal subtype breast tumours. PIK3CA is the 
second most commonly mutated gene, occurring frequently (> 10%) 
in most cancer types except OV, KIRC, LUAD and AML. PIK3CA 
mutations frequented UCEC (52%) and BRCA (33.6%), being speci- 
fically enriched in luminal subtype tumours. Tumours lacking PIK3CA 
mutations often had mutations in PIK3R1, with the highest occur- 
rences in UCEC (31%) and GBM (11%) (Fig. 2). 

Many cancer types carried mutations in chromatin re-modelling 
genes. In particular, histone-lysine N-methyltransferase genes (MLL2 
(also known as KMT2D), MLL3 (KMT2C) and MLL4 (KMT2B)) clus- 
ter in bladder, lung and endometrial cancers, whereas the lysine (K)- 
specific demethylase KDMS5C is prevalently mutated in KIRC (7%). 
Mutations in ARIDIA are frequent in BLCA, UCEC, LUAD and 
LUSC, whereas mutations in ARID5B predominate in UCEC (10%) 
(Fig. 2). 

KRAS and NRAS mutations are typically mutually exclusive, with 
recurrent activating mutations (KRAS (Gly 12), KRAS (Gly 13) and 
NRAS (Gln 61)) common in COAD/READ (30%, 5% and 5%, respect- 
ively), UCEC (15%, 4% and 2%) and LUAD (24%, 1% and 2%). EGFR 
mutations are frequent in GBM (27%) and LUAD (11%). Recurrent, 
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Figure 2 | The 127 SMGs from 20 cellular processes in cancer identified in 
12 cancer types. Percentages of samples mutated in individual tumour types 


gain-of-function mutations in IDH1 (Arg 132) and/or IDH2 (Arg 172) 
typify GBM and AML (Supplementary Table 2 and Fig. 2). Although 
KRAS residues Gly 12 and Gly 13 are commonly mutated in LUAD, 
COAD/READ and UCEC, the proportion of Gly12Cys changes is 
significantly higher in lung cancer (P< 3.2 X 10°), resulting from 
the high C>A transversion rate (Extended Data Fig. 2). 


Tumour-type-specific mutations 

Signature mutations exclusive to KIRC include those affecting VHL 
(52%) and PBRM1 (33%) (Fig. 2). Mutations to BAP1 (10%) and 
SETD2 (12%) are also most common in KIRC. Transcription factor 
CTCF, ribosome component RPL22, and histone modifiers ARIDIA 
and ARID5B have the highest frequencies in UCEC. Predominant 
COAD/READ-specific mutations are those affecting APC (82%) and 
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and Pan-Cancer are shown, with the highest percentage in each gene among 12 
cancer types in bold. 


Wnt/B-catenin signalling (93% of samples). Several mutations occur 
exclusively in AML, including recurrent mutations in NPM1 (27%) 
and FLT3 (27%), and rare mutations affecting MIR142 (Fig. 2). 
Mutations of methylation and chromatin modifiers are also typical 
in AML, mostly affecting DNMT3A and TET2. BRCA-specific muta- 
tions include GATA3 and MAP3K1, whereas KEAP1 mutations pre- 
dominate in lung cancer (LUAD 17%, LUSC 12%). EPHA3 (9%), 
SETBP1 (13%) and STK11 (9%) are characteristic in LUAD. 


Shared and cancer type-specific mutation signatures 


Cluster analysis on mutations in SMGs (Fig. 4 and Extended Data Fig. 3) 
showed 72% (1,881 of 2,611) of tumours are adjacent to those from the 
same tissue type. Notably, clustering identified several dominant groups 
in UCEC, COAD, GBM, AML, KIRC, OV and BRCA. Two major 
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Figure 3 | Distribution of mutations in 127 SMGs across Pan-Cancer 
cohort. Box plot displays median numbers of non-synonymous mutations, 
with outliers shown as dots. In total, 3,210 tumours were used for this analysis 
(hypermutators excluded). 


endometrial endometroid clusters were found, one having mutations 
in PIK3CA, PTEN and ARID1A, and the other containing mutations in 
two additional genes (PIK3R1 and CINNB1). Five major breast cancer 
clusters were observed, with mutations in CDH1, GATA3, MAP3K1, 
PIK3CA and TP53 as drivers for respective clusters. The T’P53-driven 
cluster is adjacent to an HNSC cluster and an ovarian cancer cluster, all 
having a dearth of other SMG mutations (Fig. 4). The glioblastoma 
cluster is characterized by mutations in EGFR. Two kidney clear cell 
cancer clusters were detected; both have VHL as the common driver 
and one has additional mutations in PBRM1 and/or BAP1 (refs 25-27). 
PBRM1 and BAPI1 mutations are mutually exclusive in KIRC (P = 
0.006), consistent with previous reports****. AML has three major clus- 
ters represented by various combinations of DNMT3A, NPM1 and 
FLT3 mutations, and one cluster dominated by RUNX1 mutations. 
One cluster having APC and KRAS mutations was almost exclusively 
detected in COAD/READ. Tumours from BLCA, HNSC, LUAD and 
LUSC are largely scattered over the Pan-Cancer cohort, indicating 
extensive heterogeneity in these diseases. 


BLCA [MI BRCA’ [{) COAD/READ {GBM 


) HNsc 


Mutual exclusivity and co-occurrence among SMGs 


Pairwise exclusivity and co-occurrence analysis for the 127 SMGs 
found 14 mutually exclusive (false discovery rate (FDR) < 0.05) and 
148 co-occurring (FDR < 0.05) pairs (Supplementary Table 6). TP53 
and CDH] are exclusive in BRCA, with mutations enriched in different 
subtypes"’, as are TP53 and CTNNB1 in UCEC. Cohort analysis iden- 
tified pairs where at least one gene has mutations strongly associated 
(corrected P< 0.05) to one cancer type, and also identifies TP53 and 
PIK3CA with significant exclusivity (Extended Data Fig. 4). Pairs with 
significant co-occurrence include IDH1 and ATRX in GBM”, TP53 
and CDKN2A in HNSC, and TBX3 and MLL4 in LUAD. 

Dendrix” identified a set of five genes (TP53, PTEN, VHL, NPM1 
and GATA3) having strong mutual exclusivity (P< 0.01) (Extended 
Data Fig. 5a and Supplementary Table 7). Not surprisingly, many are 
associated (P < 0.05) with one cancer type (for example, VHL muta- 
tions in KIRC), demonstrating a strong relationship between exclus- 
ivity and tissue of origin. When 600 non-cancer-type-specific genes 
were added to the analysis (Methods), we identified another set con- 
sisting of TP53, PIK3CA, PIK3R1, SETD2 and WTI (P<0.01; 
Extended Data Fig. 5b and Supplementary Table 7). Dendrix also 
finds genes with strong mutual exclusivity from each cancer type 
separately (Extended Data Fig. 5c), allowing calculation of “cancer 
exclusivity’. KIRC has the strongest exclusivity from the other 11 
cancer types, followed by AML with clear exclusivity from BRCA 
and UCEC. Conversely, COAD/READ displayed the greatest co- 
occurrence with other cancer types (Extended Data Fig. 5d). 


Clinical correlation across tumour types 


We examined how clinical features (Supplementary Table 8) correlate 
with somatic events in 127 SMGs within tumour types. Some findings 
are unsurprising, such as the correlation of TP53 mutations with gene- 
rally unfavourable indicators, for example in tumour stage (P = 0.01, 
Fisher’s exact test) and elapsed time to death (P = 0.006, Wilcoxon) in 
HNSC, age (P = 0.002, Wilcoxon rank test) and time to death (P = 
0.09, Wilcoxon) in AML, and vital status in OV (P = 0.04, Fisher). In 
UCEC, mutations in several genes are correlated with the endometrioid 
rather than serous subtype: PTEN, CTNNB1, PIK3R1, KRAS, ARID1A, 
CTCF, RPL22 and ARIDSB (all P< 0.03) (Supplementary Table 9). 
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Figure 4 | Unsupervised clustering based on mutation status of SMGs. 
Tumours having no mutation or more than 500 mutations were excluded. A 
mutation status matrix was constructed for 2,611 tumours. Major clusters of 
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mutations detected in UCEC, COAD, GBM, AML, KIRC, OV and BRCA were 
highlighted. Complete gene list shown in Extended Data Fig. 3. 
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We examined which genes correlate with survival using the Cox 
proportional hazards model, first analysing individual cancer types 
using age and gender as covariates; an average of 2 genes (range: 0-4) 
with mutation frequency =2% were significant (P= 0.05) in each 
type (Supplementary Table 10a and Extended Data Fig. 6). KDM6A 
and ARID1A mutations correlate with better survival in BLCA 
(P = 0.03, hazard ratio (HR) = 0.36, 95% confidence interval (CI): 
0.14-0.92) and UCEC (P= 0.03, HR=0.11, 95% CI: 0.01-0.84), 
respectively, but mutations in SETBP1, recently identified with worse 
prognosis in atypical chronic myeloid leukaemia (aCML)*', have a 
significant detrimental effect in HNSC (P = 0.006, HR = 3.21, 95% CI: 
1.39-7.44). BAPI strongly correlates with poor survival (P = 0.00079, 
HR = 2.17, 95% CI: 1.38-3.41) in KIRC. Conversely, BRCA2 muta- 
tions (P = 0.02, HR = 0.31, 95% CI: 0.12-0.85) associate with better 
survival in ovarian cancer, consistent with previous reports****; BRCAI 
mutations showed positive correlation with better survival, but did not 
reach significance here. 

We extended our survival analysis across cancer types, restricting 
our attention to the subset of 97 SMGs whose mutations appeared 
in 22% of patients having survival data in =2 tumour types. Taking 
type, age and gender as covariates, we found 7 significant genes: BAP1, 
DNMT34A, HGF, KDM5C, FBXW7, BRCA2 and TP53 (Extended Data 
Table 1). In particular, BAPI was highly significant (P = 0.00013, 


a AML 
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HR = 2.20, 95% CI: 1.47-3.29, more than 53 mutated tumours out 
of 888 total), with mutations associating with detrimental outcome in 
four tumour types and notable associations in KIRC (P = 0.00079), 
consistent with a recent report”®, and in UCEC (P = 0.066). Mutations in 
several other genes are detrimental, including DNMT3A (HR = 1.59), 
previously identified with poor prognosis in AML*, and KDM5C 
(HR = 1.63), FBXW7 (HR = 1.57) and TP53 (HR = 1.19). TP53 has 
significant associations with poor outcome in KIRC (P = 0.012), AML 
(P = 0.0007) and HNSC (P = 0.00007). Conversely, BRCA2 (P = 0.05, 
HR = 0.62, 95% CI: 0.38 to 0.99) correlates with survival benefit in six 
types, including OV and UCEC (Supplementary Table 10a, b). IDH1 
mutations are associated with improved prognosis across the Pan- 
Cancer set (HR=0.67, P=0.16) and also in GBM (HR=0.42, 
P= 0.09) (Supplementary Table 10a, b), consistent with previous work”. 


Driver mutations and tumour clonal architecture 


To understand the temporal order of somatic events, we analysed the 
variant allele fraction (VAF) distribution of mutations in SMGs across 
AML, BRCA and UCEC (Fig. 5a and Supplementary Table 11a) and 
other tumour types (Extended Data Fig. 7). To minimize the effect of 
copy number alterations, we focused on mutations in copy neutral 
segments. Mutations in TP53 have higher VAFs on average in all three 
cancer types, suggesting early appearance during tumorigenesis, although 
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Figure 5 | Driver initiation and progression mutations and tumour clonal 
architecture. a, Variant allele fraction (VAF) distribution of mutations in 
SMGs across tumours from AML, BRCA and UCEC for mutations (=20 
coverage) in copy neutral segments. SMGs having =5 mutation data points 
were included. ChrX, chromosome X. b, In AML sample TCGA-AB-2968 
(WGS), two DNMT3A mutations are in the founding clone, and one NRAS 
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mutation is in the subclone. In BRCA tumour TCGA-BH-A18P (exome), one 
FOXA1 mutation is in the founding clone, and PIK3R1 and MLL3 mutations 
are in the subclone. In UCEC tumour TCGA-B5-AOJV (exome), PIK3CA, 
ARIDIA and CTCF mutations are in the founding clone, and NRAS, PTEN and 
KRAS mutations are in the secondary clone. Asterisk denotes stop codon. 
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it is possible that a later mutation contributing to tumour cell expan- 
sion might have a high VAF. It is worth noting that copy neutral loss of 
heterozygosity is commonly found in classical tumour suppressors 
such as TP53, BRCA1, BRCA2 and PTEN, leading to increased VAFs 
in these genes. In AML, DNMT3A (permutation test P = 0), RUNX1 
(P = 0.0003) and SMC3 (P= 0.05) have significantly higher VAFs 
than average among SMGs (Fig. 5a and Supplementary Table 11b). 
In breast cancer, AKT1, CBFB, MAP2K4, ARID1A, FOXA1 and PIK3CA 
have relatively high average VAFs. For endometrial cancer, multiple 
SMGs (for example, PIK3CA, PIK3R1, PTEN, FOXA2 and ARIDIA) 
have similar median VAFs. Conversely, KRAS and/or NRAS mutations 
tend to have lower VAFs in all three tumour types (Fig. 5a), suggesting 
NRAS (for example, P = 0 in AML) and KRAS (for example, P = 0.02 
in BRCA) have a progression role in a subset of AML, BRCA and 
UCEC tumours. For all three cancer types, we clearly observed a shift 
towards higher expression VAFs in SMGs versus non-SMGs, most 
apparent in BRCA and UCEC (Extended Data Fig. 8a and Methods). 
Previous analysis using whole-genome sequencing (WGS) detected 
subclones in approximately 50% of AML cases'>***’; however, ana- 
lysis is difficult using AML exome owing to its relatively few coding 
mutations. Using 50 AML WGS cases, sciClone (http://github.com/ 
genome/sciclone) detected DNMT3A mutations in the founding clone 
for 100% (8 out of 8) of cases and NRAS mutations in the subclone for 
75% (3 out of 4) of cases (Extended Data Fig. 8b). Among 304 and 160 
of BRCA and UCEC tumours, respectively, with enough coding muta- 
tions for clustering, 35% BRCA and 44% UCEC tumours contained 
subclones. Our analysis provides the lower bound for tumour hetero- 
geneity, because only coding mutations were used for clustering. In 
BRCA, 95% (62 out of 65) of cases contained PIK3CA mutations in 
the founding clone, whereas 33% (3 out of 9) of cases had MLL3 muta- 
tions in the subclone. Similar patterns were found in UCEC tumours, 
with 96% (65 out of 68) and 95% (62 out of 65) of tumours containing 
PIK3CA and PTEN mutations, respectively, in the founding clone, and 
9% (2 out of 22) of KRAS and 14% (1 out of 7) of NRAS mutations in the 
subclone (Extended Data Fig. 8b and Supplementary Table 12). 


Discussion 


We have performed systematic analysis of the TCGA Pan-Cancer muta- 
tion data set, finding key insights for cancer genomes, as summarized in 
Extended Data Fig. 9. The data set contains 127 diverse SMGs, demon- 
strating that many cellular and enzymatic processes are involved in 
tumorigenesis. Notably, 66 of them are also on the ‘mut-driver genes’ 
list generated by a ratiometric method using COSMIC mutations”*. 
Although a common set of driver mutations exists in each cancer type, 
the combination of drivers within a cancer type and their distribution 
within the founding clone and subclones varies for individual patients. 
This suggests that knowing the clonal architecture of each patient’s 
tumour will be crucial for optimizing their treatment. 

Given the rate at which TCGA and International Cancer Genome 
Consortium projects are generating genomic data, there are reas- 
onable chances of identifying the ‘core’ cancer genes and pathways 
and tumour-type-specific genes and pathways in the near term. These 
results will be immediately circulated within the research community 
to assess their potential for candidate targets for diverse tumour types 
or for specific tumour type. Ultimately, these data and their associa- 
tions with different clinical features and subtypes should contribute to 
the formulation of a reference candidate gene panel for all tumour 
types that could be helpful for prognosis at various clinical time points. 


METHODS SUMMARY 


Mutation data were standardized for 12 cancer types and tracked on Synapse with 
documentation (http://dx.doi.org/10.7303/syn1729383.2). All mutation annota- 
tion format files were downloaded from the TCGA data coordinating centre, each 
being reprocessed to eliminate known, recurrent false positives and germline 
single nucleotide polymorphisms (SNP) present in the dbSNP database. All vari- 
ant coordinates were converted to GRCh37 and re-annotated using the Gencode 
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human transcript annotation imported from Ensembl release 69. Mutation con- 
text (—2 to +2 bp) was calculated for each somatic variant in each mutation 
category, and hierarchical clustering was then performed using the pairwise 
mutation context correlation across all cancer types. The mutational significance 
in cancer (MuSiC)’ package was used to identify significant genes for both indi- 
vidual tumour types and the Pan-Cancer collective. An R function ‘hclust’ was 
used for complete-linkage hierarchical clustering across mutations and samples, 
and Dendrix* was used to identify sets of approximately mutual exclusive muta- 
tions. Cross-cancer survival analysis was based on the Cox proportional hazards 
model, as implemented in the R package ‘survival’ (http://cran.r-project.org/web/ 
packages/survival/), and the sciClone algorithm (http://github.com/genome/sci- 
clone) generated mutation clusters using point mutations from copy number 
neutral segments. A complete description of the materials and methods used to 
generate this data set and its results is provided in the Methods. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 


Received 21 June; accepted 13 September 2013. 


1. Larson, D. E. et al. SomaticSniper: identification of somatic point mutations in 

whole genome sequencing data. Bioinformatics 28, 311-317 (2012). 

2. Koboldt, D.C. et al. VarScan 2: somatic mutation and copy number alteration 

discovery in cancer by exome sequencing. Genome Res. 22, 568-576 (2012). 

3. Dees, N. D. etal. MuSiC: Identifying mutational significance in cancer genomes. 

Genome Res. 22, 1589-1598 (2012). 

4. Roth, A. etal. JointSNVMix: a probabilistic model for accurate detection of somatic 

mutations in normal/tumour paired next-generation sequencing data. 

Bioinformatics 28, 907-913 (2012). 

5. Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and 

heterogeneous cancer samples. Nature Biotechnol. 31, 213-219 (2013). 

6. Jones, S. etal. Core signaling pathways in human pancreatic cancers revealed by 

global genomic analyses. Science 321, 1801-1806 (2008). 

7. Parsons, D. W. etal. An integrated genomic analysis of human glioblastoma 
multiforme. Science 321, 1807-1812 (2008). 

8. Sjéblom, T. etal. The consensus coding sequences of human breast and colorectal 
cancers. Science 314, 268-274 (2006). 

9. The Cancer Genome Atlas Research Network. Comprehensive genomic 
characterization defines human glioblastoma genes and core pathways. Nature 
455, 1061-1068 (2008). 

10. Ding, L. et a. Somatic mutations affect key pathways in lung adenocarcinoma. 

Nature 455, 1069-1075 (2008). 

11. Wood, L. D. etal. The genomic landscapes of human breast and colorectal cancers. 

Science 318, 1108-1113 (2007). 

12. The Cancer Genome Atlas Research Network. Integrated genomic analyses of 

ovarian carcinoma. Nature 474, 609-615 (2011). 

13. The Cancer Genome Atlas Network. Comprehensive molecular portraits of human 

reast tumours. Nature 490, 61-70 (2012). 

ancer Genome Atlas Research Network. Integrated genomic characterization of 

ndometrial carcinoma. Nature 497, 67-73 (2013). 

15. The Cancer Genome Atlas Research Network. Genomic and epigenomic 

andscapes of adult de novo acute myeloid leukemia. NV. Engl. J. Med. 368, 

059-2074 (2013). 

16. The Cancer Genome Atlas Network. Comprehensive molecular characterization of 

uman colon and rectal cancer. Nature 487, 330-337 (2012). 

lis, M. J. et al. Whole-genome analysis informs breast cancer response to 

romatase inhibition. Nature 486, 353-360 (2012). 

18. The Cancer Genome Atlas Research Network. Comprehensive molecular 

haracterization of clear cell renal cell carcinoma. Nature 499, 43-49 (2013). 

anahan, D. & Weinberg, R. A. The hallmarks of cancer. Cel/ 100, 57-70 (2000). 

owning, J. R. et al. The Pediatric Cancer Genome Project. Nature Genet. 44, 

19-622 (2012). 

21. Ma, Z. & Leijon, A. Bayesian estimation of beta mixture models with variational 
inference. [EEE Trans. Pattern Anal. Mach. Intell. 33, 2160-2173 (2011). 

22. Lawrence, M. S. et a/. Mutational heterogeneity in cancer and the search for new 
cancer-associated genes. Nature 499, 214-218 (2013). 

23. Tao, M. H. & Freudenheim, J. L. DNA methylation in endometrial cancer. 
Epigenetics 5, 491-498 (2010). 

24. Etcheverry, A. et al. DNA methylation in glioblastoma: impact on gene expression 
and clinical outcome. BMC Genomics 11, 701 (2010). 

25. Varela, |. et al. Exome sequencing identifies frequent mutation of the SWI/SNF 
complex gene PBRM1 in renal carcinoma. Nature 469, 539-542 (2011). 

26. Pefia-Llopis, S. et a/. BAP1 loss defines a new class of renal cell carcinoma. Nature 
Genet. 44, 751-759 (2012). 

27. Clapier,C.R. & Cairns, B. R. The biology of chromatin remodeling complexes. Annu. 
Rev. Biochem. 78, 273-304 (2009). 

28. Kapur, P. etal. Effects on survival of BAP1 and PBRM1 mutations in sporadic clear- 
cell renal-cell carcinoma: a retrospective analysis with independent validation. 
Lancet Oncol. 14, 159-167 (2013). 

29. Jiao, Y. et al. Frequent ATRX, CIC, FUBP1 and IDH1 mutations refine the 
classification of malignant gliomas. Oncotarget 3, 709-722 (2012). 

30. Vandin, F., Upfal, E. & Raphael, B. J. De novo discovery of mutated driver pathways 
in cancer. Genome Res. 22, 375-385 (2012). 


Ea 
12Oo 


wos 


a 
{2 ms 


i) 
o 
NOTE 


©2013 Macmillan Publishers Limited. All rights reserved 


31. Piazza, R. etal. Recurrent SETBP1 mutations in atypical chronic myeloid leukemia. 
Nature Genet. 45, 18-24 (2013). 

32. Yang, D. et al. Association of BRCA1 and BRCA2 mutations with survival, 
chemotherapy sensitivity, and gene mutator phenotype in patients with ovarian 
cancer. J. Am. Med. Assoc. 306, 1557-1565 (2011). 

33. Bolton, K. L. etal. Association between BRCA1 and BRCA2 mutations and survival 
in women with invasive epithelial ovarian cancer. J. Am. Med. Assoc. 307, 382-390 
(2012). 

34. Ley, T. J. etal. DNMT3A mutations in acute myeloid leukemia. N. Engl. J. Med. 363, 
2424-2433 (2010). 

35. Myung, J. K. etal. IDH1 mutation of gliomas with long-term survival analysis. Oncol. 
Rep. 28, 1639-1644 (2012). 

36. Ding, L. et a/. Clonal evolution in relapsed acute myeloid leukaemia revealed by 
whole-genome sequencing. Nature 481, 506-510 (2012). 

37. Welch, J. S. etal. The origin and evolution of mutations in acute myeloid leukemia. 
Cell 150, 264-278 (2012). 

38. Vogelstein, B. et al. Cancer genome landscapes. Science 339, 1546-1558 (2013). 


Supplementary Information is available in the online version of the paper. 


Acknowledgements This work was supported by the National Cancer Institute grants 
RO1CA180006 to L.D. and PO1CA101937 to TJ.L., and National Human Genome 


ARTICLE 


Research Institute grants RO1HGO05690 to BJ.R., U54HGO03079 to R.K.W., and 
UO1HGO06517 to L.D., and National Science Foundation grant IIS-1016648 to B.R. 
We gratefully acknowledge the contributions from TCGA Research Network and its 
TCGA Pan-Cancer Analysis Working Group. We also thank M. Bharadwaj for technical 
assistance. 


Author Contributions L.D. and R.K.W. supervised the research. L.D., C.K., M.D.M., F-V., 
K.Y., B.N., C.L., M.X.,M.D.M.L, M.A.W., J.F.M., M.J.W., C.A.M.,J.S.W. and B.J.R. analysed the 
data. M.C.W. and Q.Z. performed statistical analysis. M.D.M., C.K., F.V., C.L., M.X., K.Y,, 
B.N., Q.Z., M.C.W., J.F.M., M.D.M., M.A.W. and L.D. prepared the figures and tables. L.D., 
T.J.L, C.K. and B.J.R conceived and designed the experiments. L.D., M.D.M., C.K., F.V., 
C.L, BJ.R., K.Y. and M.C.W. wrote the manuscript. 


Author Information Reprints and permissions information is available at 
www.nature.com/reprints. The authors declare no competing financial interests. 


Readers are welcome to comment on the online version of the paper. Correspondence 
and requests for materials should be addressed to L.D. (Iding@genome.wustl.edu). 


This work is licensed under a Creative Commons Attribution- 
aaa \ionCommercial-Share Alike 3.0 Unported licence. To view a copy of this 


licence, visit http://creativecommons.org/licenses/by-nc-sa/3.0 


17 OCTOBER 2013 | VOL 502 | NATURE | 339 


©2013 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


METHODS 


Standardization and tracking of mutation data from 12 cancer types. The three 
TCGA genome sequencing centres (GSCs; Baylor Human Genome Center, Broad 
Institute, and The Genome Institute at Washington University) collectively per- 
formed exome sequencing on thousands of tumour samples and matched normal 
tissues, the latter being used as controls to distinguish somatic mutations from 
inherited variants. These controls were most often peripheral blood, but skin 
tissue was used in 199 AML samples as well as 1 buccal source, and adjacent 
tumour-free tissue was used for 927 cases, with 120 cases having normal DNA 
from blood and adjacent solid normal tissue. 

Exome capture targets may differ among GSCs, as well as across cohorts at the 
same GSC because the capture technologies and sequencing platforms continue 
to evolve over time. Therefore, collecting sequencing coverage data for each 
sample is crucial for most variant significance analyses. Somatic variant calling 
methods also differ among GSCs for similar reasons, in addition to the fact that 
filtering strategies may be tuned to emphasize either sensitivity or specificity of 
calls. Finally, the TCGA disease analysis working groups (AWGs) may optionally 
perform manual curation of the variant calls, in which false positives are removed 
and true negatives are recovered. AWGs and GSCs also collaboratively select 
putative variants for validation or for recovering variants from regions that 
reported low coverage in the first pass of exome sequencing. These steps mean 
that somatic variant sensitivity and specificity are mostly comparable across 
samples of a given TCGA tumour type, but that they differ considerably among 
tumour types, creating significant challenges for Pan-Cancer analyses. 

Complete standardization of sensitivity could not be attained, as it would have 
required a uniform variant calling and filtering workflow across all tumour- 
normal pairs. Instead, publicly available somatic variant calls in mutation annota- 
tion format (MAF) files from the TCGA were used to both ensure reproducibility 
and take advantage of extensive manual curation performed over the years by 
experts in the disease or in genomic sequence analysis and annotation. Specifically, 
all MAF files were downloaded from the TCGA data coordinating centre, each 
being reprocessed to eliminate known, recurrent false positives and germline single 
nucleotide polymorphisms (SNP) present in the dbSNP database. All variant 
coordinates were transferred to GRCh37 and re-annotated using the Gencode 
human transcript annotation imported from Ensembl release 69. Per sample, 
per gene coverage values were obtained using WIG-formatted reference coverage 
files associated with the BAMs or by processing the original BAM files directly. 
Details were tracked on Synapse with provenance and documentation (https:// 
www.synapse.org/#!Synapse:syn 1729383). 

Mutation frequency and spectrum analysis. We calculate mutation frequency 
by dividing the number of validated somatic variants by the number of base pairs 
that have sufficient coverage. Minimum coverage is six and eight reads for normal 
and tumour BAMs, respectively. For mutation spectrum we classify the mutation 
by six types (transitions/transversions). Mutation context is generated by count- 
ing the frequency of A, T, C and G nucleotides that are 2 bp 5’ and 3’ to each 
variant within the six mutation categories. For the clustering, we pooled all 
samples (excluding hypermutators having >500 mutations) for each cancer type. 
We calculated the mutation context (—2 to +2 bp) for each somatic variant in 
each mutation category. A hierarchical clustering was then done using the pair- 
wise correlation of the mutation context across all cancer types. We used correla- 
tion modules in the mutational significance in cancer (MuSiC) package to identify 
genes with mutations that are positively correlated with the number of mutations 
in the tumour sample. This analysis was performed for all 12 cancer types. Only 
genes mutated in at least 5% of tumours were included in the analysis. A list of 
genes known to be involved in DNA mismatch repair is included Supplementary 
Table 13. 

SMG analysis. We used the SMG test in the MuSiC suite’ to identify significant 
genes for each tumour type and also for Pan-Cancer tumours. This test assigns 
mutations to seven categories: AT transition, AT transversion, CG transition, CG 
transversion, CpG transition, CpG transversion and indel, and then uses statist- 
ical methods based on convolution, the hypergeometric distribution (Fisher's 
test), and likelihood to combine the category-specific binomials to obtain overall 
P values. All P values were combined using the methods described previously’. 
SMGs are listed in Fig. 2. Finally, for the analysis of SMGs, genes not typically 
expressed in individual tumour type or/and Pan-Cancer tumour samples were 
filtered if they had an average read per kilobase per million (RPKM) = 0.5. For the 
RNA sequencing (RNA-seq)-based gene expression analysis, we used the 
‘Pancanl12 per-sample logy-RSEM’ matrix from Synapse (https://www.synapse. 
org/#!Synapse:syn1734155). A gene qualified as ‘expressed’ if it had at least three 
reads in at least 70% of samples. Annotation based curation was also performed. 
Tumour specificity analysis. To make quantitative inferences as to the number 
of cancer types with which an individual gene associates, we calculated the 
empirical distributions of frequency for each cancer (tissue) type and declared 


an association (setting indicator variable to 1) if a given gene frequency within a 
type exceeded a threshold. Otherwise we set the indicator to 0, indicating no 
association. We took the threshold as a standardized Z-score of 0.2 above the 
mean based on the estimated level of noise in the 127 significant genes as quan- 
tified by the coefficient of variation for each cancer type. We then computed an 
overall distribution on the indicator variable. The mean for each functional 
category having at least five genes was then converted to a Z-score based on the 
descriptive statistics (mean and standard deviation) of the indicator distribution. 
Unsupervised clustering. Somatic point mutations and small indels of 127 SMGs 
across the 3,281 tumours were collected. To reduce noise from passenger muta- 
tions, tumours having more than 500 somatic mutations (considered hypermu- 
tators) were excluded from this analysis. Tumours with zero detected somatic 
mutations were also excluded, resulting in mutations from 2,611 tumours for 
downstream clustering analysis. A mutation status matrix (sample * gene) was 
constructed and passed to the R function ‘hclust’ for complete-linkage hierarch- 
ical clustering and a respective heatmap with a dendrogram was plotted. Tumours 
from BLCA, HNSC, LUAD and LUSC are largely scattered over the Pan-Cancer 
cohort, indicating extensive heterogeneity in these diseases. For instance, eight 
LUAD and two LUSC tumours are in the solid colorectal cluster, largely owing to 
their KRAS mutations (Fig. 4). Three UCEC, two GBM, one OV and one HSNC 
samples are in the BRCA cluster courtesy of TP53 and PIK3CA mutations. The 
resolution of this analysis could be improved by incorporating copy number data, 
structural variants, gene expression, proteomics and methylation. 

Dendrix analysis of mutation relationships. We used Fisher’s exact test to 
identify pairs of SMGs with significant (FDR = 0.05 by Benjamini-Hochberg) 
exclusivity and co-occurrence. We identified significant pairs by analysing all 
samples together and by analysing samples in each cancer type separately. A large 
number of pairs (142) with significant co-occurring mutations is identified only 
considering the Pan-Cancer data set (Extended Data Fig. 4); these pairs include 
several candidate genes (for example, NAV3, RPL22 and TSHZ3) whose function 
in carcinogenesis is not well characterized. 

We applied our de novo driver exclusivity (Dendrix) algorithm to identify sets 
of approximately mutually exclusive mutations on all samples together. Dendrix 
finds a set M of genes of maximum weight W(M), in which W(M) is the difference 
between the number of samples with a mutation in M and the number of samples 
with more than one mutation in M. The maximum scoring set of genes of a given 
size is identified using a Markov chain Monte Carlo approach. We first applied 
Dendrix considering only the 127 SMGs, and then extended our analysis to 
consider the 1,000 genes of smallest median q value among the three q values 
(convolution, Fisher’s combined test, and likelihood ratio) reported by MuSiC. 
From these 1,000 genes, we discarded the ones with mutations strongly associated 
(Bonferroni corrected P = 0.05 by Fisher’s exact test) with a cancer type, and this 
resulted in 600 genes for Dendrix analysis. 

We also applied Dendrix to identify sets of exclusive mutations among the 

SMGs in each cancer type separately. The exclusivity and co-occurrence between 
different sets of mutations was assessed using Fisher’s exact test, and the mutation 
status (‘mutated’ ifat least one gene in the set is mutated, ‘not mutated’ otherwise) 
of the two sets as categories for the 2 X 2 contingency table. 
Cross-cancer-type survival analysis using Cox proportional hazards model. 
We used the clinical correlation module of MuSiC to examine how clinical fea- 
tures correlate with somatic events within individual tumour types. Fisher’s exact 
test was used for analysing categorical features, and the Wilcoxon rank sum test 
was used for quantitative variables. We used the standard Cox proportional 
hazards model for individual cancer types as well as cross-cancer survival ana- 
lysis, as implemented in the R package ‘survival’ (http://cran.r-project.org/web/ 
packages/survival/). Here, the effects of all mutations are taken to be constant 
over time, that is, they are ‘stationary coefficients’. Hazard ratios exceeding 1 
indicate an overall detrimental effect across cancers, whereas those below 1 
associate with better outcome. Calculations included only those genes having 
mutation frequencies of at least 2% (for both individual cancer types and Pan- 
Cancer) in at least two cancer types (for Pan-Cancer). In practice, this means that 
although some genes encompass many types, for example, TP53 is calculated over 
12 types, most are based on only a few (Extended Data Table 1 and Supplemen- 
tary Table 10a, b). There is no basis for calculation of genes having 2% mutation in 
only a single type, although single-type analysis was computed similarly for all 
genes having at least 2% in all cancer types. Analyses were performed using age 
and gender as covariates. 

We also performed Pan-Cancer survival analysis by stratifying on cancer type 
in the Cox regression model and found the results are largely consistent with 
taking cancer type as a covariate (14 out of top 15 significant genes are overlap- 
ping from these two analyses) (Supplementary Table 10b). Furthermore, stage 
was used as a covariate for individual cancer type survival analysis (AML and 
COAD/READ not included). Again, the results are fairly consistent with taking 
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only age and gender as covariates (for example, 12 out of top 15 significant genes 
are overlapping from these two analyses for UCEC) (Supplementary Table 14). 
Clonality and mutation VAF analysis. We computed the VAFs of somatic 
mutations in SMGs using TCGA targeted validation data or/and exome and 
RNA sequencing data for AML, BRCA and UCEC. An internally developed tool 
called Bam2ReadCount (unpublished), which counts the number of reads sup- 
porting the reference and variant alleles, was used for computing VAFs for point 
mutations and short indels in copy number neutral segments. Only mutation sites 
having =20X coverage and SMGs having at least five data points were included 
in downstream analyses. Permutation and t-tests were used to identify genes 
with significantly higher or lower VAFs than the average (Supplementary 
Table lla, b). These indicate chronological order-of-appearance of somatic 
events during tumorigenesis. VAFs for mutations from genes that are not iden- 
tified as significantly mutated were similarly computed for generating control 
VAF density distribution. We also computed VAF distribution for the other nine 
cancer types, and plots are included in Extended Data Fig. 7. In total, 91 BLCA, 
772 BRCA, 144 COAD/READ, 62 GBM, 144 HNSC, 195 KIRC, 197 LAML, 216 
LUAD, 146 LUSC, 278 OV and 248 UCEC tumours were used for SMG VAF 
distribution analysis. 
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We further investigated the expression level of somatic mutations using avail- 
able RNA sequencing data for AML, BRCA and UCEC, and then compared 
observed mutant allele expressions with expected levels based on DNA VAFs 
(assuming no allelic expression bias). A total of 671 BRCA, 170 AML and 190 
UCEC tumours with RNA-seq BAMs were used for this analysis. Notably, we 
observed at least a twofold increase of variant allele expressions in 3.9%, 12.9% 
and 5.9% of mutations from SMGs in AML (for example, TP53, STAG2 and 
SMC3), BRCA (for example, CDH1, TP53, GATA3 and MLL3), and UCEC (for 
example, ARIDIA and FGFR2), respectively (Supplementary Table 11a). We 
further compared expression level distributions across mutations from SMGs 
and non-SMGs. For all three cancer types, we clearly observed a shift towards 
higher expression VAFs in SMGs versus non-SMGs, which was most apparent in 
BRCA and UCEC (Extended Data Fig. 8a). This result suggests potential selection 
of these mutations during tumorigenesis. 

SciClone (http://github.com/genome/sciclone) was used for generating muta- 
tion clusters using point mutations from copy number neutral segments. Only 
variants with greater than or equal to 100X coverage were used for clustering and 
plotting. Validation data were used for AML, and exome sequencing data were 
used for BRCA and UCEC. SMGs were highlighted automatically by sciClone to 
show their clonal association (Extended Data Fig. 8b). 


©2013 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


i) 

= 

2 

S 

~_ 

= 

= 

“a 

(eo) 

_— 

oO 

2 

= 

= 

Zz 

=5 -4 =8 =2 =1 2510 4 
Base 
A 
Cc 
G 
T 
Relative Position 
Extended Data Figure 1 | Mutation context across 12 cancer types. across all 12 cancer types. The y axis denotes the total number of mutations in 
Mutation context showing proportions of A, T, C and G nucleotides each category. 
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genes that are strongly associated with one cancer type. b, The highest scoring 
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sets identified by Dendrix in individual cancer types. Eight types include TP53 
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co-occurrence with other types. 


©2013 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


©2013 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


Extended Data Figure 6 | Kaplan-Meier plots for genes significantly based on cytogenetics or FLT3 internal tandem duplication status in this 
associated with survival. Plots are shown for 24 genes showing significant analysis, and cannot discern this effect. Because most patients with OV (95%) 
(P = 0.05) association in individual cancer types. Although NPM1 mutationsin _ have TP53 mutations, we could not obtain sufficient non-TP53 mutant controls 
patients with AML having intermediate cytogenetic risk are relatively benignin for confidently dissecting the relationship between TP53 status and survival in 
the absence of internal tandem duplications in FLT3, we did not stratify patients OV. 
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Extended Data Figure 8 | Mutation expression and tumour clonal 
architecture in AML, BRCA and UCEC. a, Density plots of expressed VAFs 
for mutations in SMGs (blue) and non-SMGs (red). b, SciClone clonality 
example plots for AML (validation data), BRCA and UCEC. Two plots are 


3 RFCS (S222C) 6 ATR (H82Y) 


shown for each case: kernel density (top), followed by the plot of tumour VAF 
by sequence depth for sites from selected copy number neutral regions. 
Mutations (with annotations) in SMGs were shown. 
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mutation rate 
Distribution of mutation rates 
across the twelve cancer 
types reveals interesting 
features, such as clusters in 
UCEC and COAD/READ that 
indicate factors other than 
age in the development of 
these tumors. 


Environmental effects on 
cancer development can also 
be observed in mutation 
spectrum. For instance, lung 
tumors show higher 
proportions of C-to-A 
transversions — a signature 
of cigarette smoke exposure. 
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When grouped by mutation, 
we see that significant 
mutations fall into several 
distinct categories: 
transcription factors & 
regulators, histone modifiers, 
genome integrity, RTK 


signaling, cell cycle, and more. 


Extended Data Figure 9 | Summary of major findings in Pan-Cancer 12. 
Systematic analysis of the TCGA Pan-Cancer mutation dataset identifies 
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mutation relation 


We found 14 significant 
mutually exclusive pairs and 
148 co-occurring pairs, We 
also identified a set 
consisting of TP53, PIK3CA, 
PIK3R1, SETD2, and WT1. 


---- Wildtype 


clinical features 


We found 7P53to be significant, 
with mutations being associated 
with detrimental outcome 
through joint analysis of 12 
tumor types. Mutations in BAP? 
are correlated with detrimental 
outcome particularly in KIRC 
and UCEC. 
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clonal architecture 


Mutations in TP53, DNMT3A, 
and PIK3CA play an initiation 
role in the tumorigenesis. 
Mutations in KRAS and/or 
NAAS largely play a 
progression role in the 
tumorigenesis of AML, 
BRCA, and UCEC. 


SMGs, cancer-related cellular processes, and genes associated with clinical 
features and tumour progression. 
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Extended Data Table 1 | Clinical correlation and survival analysis for genes mutated at =2% frequency in at least 2 tumour types 


CROSS-CANCER INPUTS INDIVIDUAL CANCER COMPONENT INPUTS AND RESULTS OVERALL CROSS-CANCER 
RESULTS 
GENE NUMBER OF TOTAL MUTATED CANCERTYPE TOTAL MUTATED HAZARD RATIO (95% Cl) P-VALUE HAZARD RATIO (95% Cl) P-VALUE 
CANCER TYPES TUMORS TUMORS TUMORS TUMORS 
BAP1 4 888 53 BLCA 92 3 3.57 (0.43:29.71) 0.24 2.2 (1.47:3.29) 0.00013 
KIRC 416 42 2.17 (1.38:3.41) 0.00079 
LUAD 150 3 2.13 (0.49:9.21) 0.31 
UCEC 230 5 4.16 (0.91:18.97) 0.066 
DNMT3A 3 507 60 LAML 186 47 1.45 (0.97:2.16) 0.07 1.59 (1.11:2.27) 0.011 
LUAD 150 6 0.86 (0.19:3.76) 0.84 
LUSC 171 Ff 1.62 (0.56:4.68) 0.37 
HGF 3 621 32 HNSC 300 8 0.69 (0.22:2.19) 0.53 0.47 (0.23:0.96) 0.038 
LUAD 150 14 0.52 (0.16:1.74) 0.29 
LUSC 171 10 0.39 (0.1:1.62) 0.2 
KDM5C 4 967 44 KIRC 416 27 1.72 (0.95:3.11) 0.073 1.63 (1.02:2.6) 0.04 
LUAD 150 r 0.9 (0.28:2.97) 0.87 
LUSC 171 5 2.44 (0.74:7.99) 0.14 
UCEC 230 5 7.78 (0.92:65.6) 0.059 
FBXW7 6 986 82 BLCA 92 9 1.56 (0.5:4.8) 0.44 1.57 (1.02:2.44) 0.042 
COAD/READ 193 22 2.18 (0.62:7.6) 0.22 
HNSC 300 15 0.81 (0.35:1.88) 062 
LUSC 171 9 2.37 (1.06:5.28) 0.036 
UCEC 230 27 1.93 (0.55:6.84) 0.31 
BRCA2 6 1250 54 BLCA 92 6 0.98 (0.13:7.51) 0.99 0.62 (0.38:0.99) 0.047 
HNSC 300 11 0.93 (0.41:2.15) 0.87 
LUAD 150 7 0.88 (0.27:2.92) 0.84 
LUSC 171 10 0.69 (0.25:1.94) 049 
OV 307 10 0.31 (0.12:0.85) 0.022 
UCEC 230 10 4e-8 (0:inf) 1 
TP53 12 3083 1290 BLCA 92 47 0.98 (0.45:2.17) 0.97 1.19 (1.0:1.41) 0.049 
BRCA 763 251 1.36 (0.87:2.14) 0.18 
COAD/READ 193 113 0.63 (0.22:1.77) 0.38 
GBM 275 80 0.78 (0.55:1.1) 0.16 
HNSC 300 209 2.58 (1.62:4.13) 7.00E-05 
KIRC 416 9 3.16 (1.28:7.78) 0.012 
LAML 186 14 2.76 (1.54:4.96) 0.0007 
LUAD 150 78 1.13 (0.63:2.0) 0.68 
LUSC 171 135 0.73 (0.43:1.24) 0.25 
Ov 307 290 0.53 (0.30:0.92) 0.025 
UCEC 230 64 1.72 (0.61:4.87) 0.31 


Clinical correlation and survival analysis for genes mutated at = 2% frequency in at least 2 tumor types. 
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Reprogramming in vivo produces teratomas 
and iPS cells with totipotency features 


Maria Abad!, Lluc Mosteiro’, Cristina Pantoja', Marta Cafiamero?, Teresa Rayon®, Inmaculada Ors®, Osvaldo Grafia‘*, Diego Megias”, 
Orlando Dominguez®, Dolores Martinez’, Miguel Manzanares®, Sagrario Ortega® & Manuel Serrano? 


Reprogramming of adult cells to generate induced pluripotent stem cells (iPS cells) has opened new therapeutic opportunities; 
however, little is known about the possibility of in vivo reprogramming within tissues. Here we show that transitory induction 
of the four factors Oct4, Sox2, KIf4 and c-Myc in mice results in teratomas emerging from multiple organs, implying that full 
reprogramming can occur in vivo. Analyses of the stomach, intestine, pancreas and kidney reveal groups of dedifferentiated 
cells that express the pluripotency marker NANOG, indicative of in situ reprogramming. By bone marrow transplantation, we 
demonstrate that haematopoietic cells can also be reprogrammed in vivo. Notably, reprogrammable mice present circulating 
iPS cells in the blood and, at the transcriptome level, these in vivo generated iPS cells are closer to embryonic stem cells (ES 
cells) than standard in vitro generated iPS cells. Moreover, in vivo iPS cells efficiently contribute to the trophectoderm 
lineage, suggesting that they achieve a more plastic or primitive state than ES cells. Finally, intraperitoneal injection of in 
vivo iPS cells generates embryo-like structures that express embryonic and extraembryonic markers. We conclude that 
reprogramming in vivo is feasible and confers totipotency features absent in standard iPS or ES cells. These discoveries 


could be relevant for future applications of reprogramming in regenerative medicine. 


Reprogramming into pluripotency remains an intense field of investi- 
gation that is providing many insights about cellular plasticity’. 
Cellular reprogramming has been achieved under carefully controlled 
in vitro culture conditions’, whereas the in vivo tissue microenviron- 
ment is, in principle, conducive to cellular differentiation and opposed 
to reprogramming. However, we took note of remarkable examples in 
mice in which the normally irreversible state of cellular differentiation 
has been altered, inducing direct conversions in vivo from one cell type 
into a different one*'®. Encouraged by these precedents, we have 
attempted to achieve reprogramming in vivo. 


Generation of reprogrammable mice 


We have generated reprogrammable mice similar, but not identical, to 
others previously described''"’. A total of four transgenic mouse lines 
were obtained, each one carrying the transcriptional activator (rtTA) 
within the ubiquitously-expressed Rosa26 locus“ and a single copy of 
a lentiviral doxycycline-inducible polycistronic cassette encoding the 
four murine factors Oct4 (also known as Pou5f1), Sox2, Klf4 and 
c-Myc" (Fig. 1a; Extended Data Fig. 1a). In two of the four transgenic 
lines, the cassette was highly induced in most tissues (Extended Data 
Fig. 1b) and mouse embryonic fibroblasts (MEFs) from these lines 
were efficiently reprogrammed in vitro upon addition of doxycyline 
(Extended Data Fig. 1c, d). The other two transgenic lines did not 
express the cassette and their derived MEFs did not reprogram upon 
doxycycline addition. We have named the two functional transgenic 
lines as i4F-A and i4F-B, i4F standing for ‘inducible four factors’. We 
determined the integration sites of the transgenes, which in the case of 
line i4F-A is within an intron of the Neto2 gene, and in the case of line 
i4F-B is within an intron of the Pparg gene (Extended Data Fig. 2a). 
Transcription of these two genes remained unaltered in a number of 
tissues, either with or without doxycycline (Extended Data Fig. 2b). We 


conclude that transgenic lines i4F-A and i4F-B contain a functional 
inducible reprogramming transgene that is expressed in most tissues 
without affecting the resident endogenous genes. 


Reprogrammable mice generate teratomas 

To test the possibility of in vivo reprogramming, we first treated i4F-A 
and i4F-B mice continuously with a high dose of doxycycline (1 mg ml ') 
in the drinking water. This treatment resulted in weight loss and severe 
morbidity in both transgenic lines after 1 week. Histological examina- 
tion of the mice revealed alterations in many tissues, particularly pro- 
found in the intestine and pancreas. The intestinal epithelium showed 
a generalized cytological and architectural dysplasia (Extended Data 
Fig. 3a), probably responsible for the weight loss. A similar phenotype 
has been reported for mice with transgenic expression of Oct4 or c-Myc'*"*. 
In the case of the pancreas, mice presented multifocal dysplasia (Extended 
Data Fig. 3b). Taking into account these observations, we tested two 
milder induction protocols that turned out to be compatible with the 
long-term survival of the mice. In particular, 2.5-week treatment with 
low doxycyline (0.2 mg ml *) or 1-week treatment with high doxycy- 
cline (1 mg ml‘), both followed by doxycycline withdrawal. Remarkably, 
after a variable period of time, treated mice succumbed to the presence 
of tumoral masses (Fig. 1b; Extended Data Fig. 4a), most of which 
consisted of teratomas (Fig. 1c; Extended Data Fig. 4b). Teratomas 
are a particular class of tumours that originate from pluripotent cells 
after a process of expansion and disorganized differentiation. Most of 
the teratomas (32/45, 71%) were well differentiated and presented 
abundant examples of the three embryonic germ layers (Fig. 1d). There- 
fore, the presence of teratomas in our reprogrammable mice is indicative 
of reprogramming into full pluripotency. Mice treated with the long- 
induction/low-doxycycline protocol developed teratomas faster and at 
a higher incidence rate than those treated with the short-induction/ 
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high-doxycycline protocol (Fig. le). The incidence of teratomas was 
higher in line i4F-A than in i4F-B (Fig. 1f), and in both lines and 
protocols, teratomas appeared in a variety of organs (Fig. 1g). Repro- 
grammable mice that were not treated with doxycycline remained 
healthy at least during 2 years of observation, indicating the absence 
of leaky expression of the reprogramming cassette. The presence of 
multiples teratomas in both lines implies that reprogramming into 
pluripotency is feasible within in vivo conditions. 


In vivo reprogramming occurs in multiple tissues 


Previous work has shown that haematopoietic progenitors can repro- 
gram with high efficiency’’. This, together with the broad distribution 
of haematopoietic cells within the organism, led us to consider that 
teratomas in our reprogrammable mice could originate from cells of 
the haematopoietic lineage. To address this, we performed bone mar- 
row transplants into lethally irradiated hosts. In one setting, bone 
marrow (BM) from reprogrammable mice (i4F-BM) was transplanted 
into wild-type hosts and, in the reciprocal transplantation, wild-type 
bone marrow was transplanted into reprogrammable mice. Interestingly, 
both types of bone marrow reconstituted mice developed multiple 
teratomas upon induction (Fig. 2a, b). These results suggest that hae- 
matopoietic and non-haematopoietic cells can both be reprogrammed 
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Figure 1 | Generation of teratomas 
upon in vivo induction of the four 
factors Oct4, Sox2, KIf4 and c-Myc. 
a, Reprogrammable mouse 
generation. b, Reprogrammable 
mouse with multiple teratomas 
(arrowheads). c, Teratomas in the 
intestine of a reprogrammable 
mouse. d, Histological section of a 
teratoma with mesoderm (mes), 
endoderm (end) and ectoderm (ect). 
Asterisk indicates giant trophoblast 
cells and haemorrhages. e, Survival of 
reprogrammable mice after the 
indicated doxycycline treatments. 
Time refers to the initiation of the 
treatments. f, Incidence of teratomas. 
Data corresponds to their time of 
death or week 30. g, Localization of 
teratomas in mice with teratomas. 
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in vivo. Of note, the teratomas present in i4F-BM transplanted mice 
were not emerging from organs, but outgrew attached to the serous 
membranes of the thoracic and abdominal cavities, whereas repro- 
grammable mice (either whole-body or transplanted with wild-type 
bone marrow) presented teratomas within multiple organs, as prev- 
iously mentioned (Fig. 1g). 

The above results prompted us to look for early reprogramming 
events in non-haematopoietic cells. In particular, we focused on the 
stomach, intestine, pancreas and kidney of whole-body reprogram- 
mable mice. We performed double immunohistochemistry against 
the epithelial marker cytokeratin 19 (CK19, also known as KRT19) 
and the pluripotency marker NANOG. We found aberrant individual 
gastric glands and intestinal crypts (in the small and large intestine) 
that had lost or decreased CK19 and expressed NANOG (Fig. 2c, d). 
In some cases the entire gland or crypt was aberrant, whereas in others 
intermediate situations were found. In the pancreas, we observed both 
acinar-like (CK19-negative) and ductal-like (CK19-positive) struc- 
tures with NANOG-positive cells (Fig. 2e). In the kidney, which does 
not undergo major detectable morphological changes upon doxycy- 
cline induction but has a high incidence of teratomas (Fig. 1g), we 
found isolated kidney tubules expressing NANOG (Extended Data 
Fig. 4c). In general, the number of reprogramming events evidenced 
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Figure 2 | Many cell types are reprogrammed in vivo. a, Incidence of teratomas 
in wild-type mice transplanted with reprogrammable bone marrow (BM). Time 
refers to the initiation of the treatment. b, Same as a in reprogrammable mice 
transplanted with wild-type BM. Ticks indicate censored mice dead without 
teratomas due to pulmonary oedemas secondary to irradiation (a) or systemic 
i4F induction (b). c, Double immunohistochemistry of NANOG (dark brown) 
and cytokeratin 19 (CK19, magenta) in the stomach of whole-body 
reprogrammable mice. d, Same staining as c in the large intestine. e, Same 
staining as c in the pancreas. All scale bars correspond to 100 um. 


by NANOG (Fig. 2d) was clearly lower than the number of cells expres- 
sing the reprogramming cassette (Extended Data Fig. 3a), implying 
that reprogramming in vivo is a low efficiency process that likely 
involves stochastic events, similar to what happens during in vitro 
reprogramming’*”’. The presence of isolated NANOG-positive struc- 
tures, such as intestinal crypts or kidney tubules, probably reflects the 
clonal expansion of an individual reprogramming event. These observa- 
tions support the concept that reprogramming occurs in situ, at least in the 
case of the epithelial cells of the stomach, intestine, pancreas and kidney. 


Reprogrammable mice present iPS cells in the blood 


Given the feasibility of in vivo reprogramming, we wondered whether 
it was possible to detect circulating iPS cells in the bloodstream. To 
address this, the cellular fraction of the blood from induced repro- 
grammable mice (~ 10° leukocytes) was seeded into plates with feeder 
fibroblasts and iPS cell culture medium (all procedures after blood 
extraction were performed in the absence of doxycycline). Remarkably, 
after a variable period of time (1-2 weeks), colonies with iPS cell mor- 
phology were visible (Fig. 3a). These colonies were expanded (Fig. 3b) 
and were found to express pluripotency markers (Fig. 3c, d; Extended 
Data Fig. 5a) and to have silenced the lentiviral reprogramming cas- 
sette (Extended Data Fig. 5b). Moreover, in vivo iPS cells generated 
subcutaneous teratomas with representation of the three germ layers 
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(Fig. 3e), produced mouse chimaeras (Fig. 3f) and contributed to the 
germ lineage (Extended Data Fig. 5c). Therefore, we conclude that 
bona fide iPS cells can be isolated from the blood of reprogrammable 
mice. Of note, colonies of iPS cells were obtained from whole-body 
reprogrammable mice, as well as from wild-type mice with reprogram- 
mable bone marrow and from reprogrammable mice with wild-type 
bone marrow (Extended Data Fig. 5d). Therefore, both haematopoietic 
and non-haematopoietic cells can generate in vivo circulating iPS cells. 
The overall frequency of mice with colony-forming iPS cells was 6.5% 
(5/77) (Fig. 3g), and this frequency was similar in the two transgenic 
lines (Extended Data Fig. 5d). In those blood samples that were positive 
for iPS cells colony formation, the number of colonies obtained was 
variable (9 + 5) (Extended Data Fig. 5d). We will refer to the circulat- 
ing iPS cells as in vivo iPS cells to distinguish them from the standard 
in vitro generated ones. 


Transcriptomic analysis of in vivo iPS cells 


To further characterize the in vivo iPS cells, we performed messenger 
RNA deep sequencing. We sequenced in vivo iPS cells (n = 6 inde- 
pendent clones), in vitro iPS cells (n = 5 independent clones derived 
from i4F-MEFs) and ES cells (JM8.F6 (ref. 20), Bruce4 (ref. 21) and 
CNIO in-house made C57BL6.10). The homogeneity of the samples of 
each cell type was confirmed by the high intra-group correlation coef- 
ficients (Extended Data Fig. 6a). Furthermore, all of the 14 transcrip- 
tomes analysed were highly similar regardless of their origin (the lowest 
pairwise correlation coefficient was r = 0.93) (Extended Data Fig. 6a). 
Interestingly, inter-group comparisons by scatter plots, volcano plots 
and Pearson coefficient correlations indicated a higher degree of sim- 
ilarity between in vivo iPS and ES cells (r = 0.997), than between the 
other two possible combinations (in vivo iPS cells vs in vitro iPS cells, 
r = 0.971; ES cells vs in vitro iPS cells, r = 0.966) (Extended Data Fig. 
6c). Moreover, unsupervised hierarchical clustering of the 14 transcrip- 
tomes classified together in vivo iPS cells and ES cells, and separated 
them from in vitro iPS cells (Fig. 3h). The same classification was 
obtained using principal component analysis, which is another 
unbiased method to quantify the degree of similarity between large 
data sets (Extended Data Fig. 6b). The lists of differentially expressed 
genes were obtained for further analyses (Supplementary Tables 1-3). 
Interestingly, among the genes commonly upregulated in in vivo iPS 
and ES cells compared to in vitro iPS cells (a total of 51 genes; 
Supplementary Table 4) there were several pluripotency genes, includ- 
ing Gbx2, Lin28a, Utf1 and others associated with pluripotency, such as 
Epcam and Ccne1. The upregulation of these genes in in vivo iPS and ES 
cells was validated by quantitative PCR with reverse transcription 
(qRT-PCR) (Extended Data Fig. 7a). Having established that in vivo 
iPS cells are extremely similar to ES cells, we focused on those few genes 
that were differentially expressed in in vivo iPS cells relative to ES cells 
and in vitro iPS cells (Fig. 3i; Supplementary Table 5). Among these 
genes, we validated the upregulation of Nirp4f (known to be enriched 
at the morula state)”, Etv4 (a transcription factor of the Ets family 
expressed during early development)”, Ppml1j (a protein phos- 
phatase transcriptionally upregulated upon GSK3f inhibition)”, 
8430410A 17Rik (a gene consistently found associated with stemness)”” 
and Tgm1 (Fig. 3j; Extended Data Fig. 7b), and the downregulation of 
Mmp12 and Tnc (both encoding components or regulators of the 
extracellular matrix) (Fig. 3j). Importantly, the pattern of expression 
of these genes was similar in in vivo iPS cells and in morulas (Fig. 3); 
Extended Data Fig. 7b), thus suggesting that in vivo iPS cells and 
morulas share transcriptional features that are absent in ES cells or 
in in vitro iPS cells. We conclude that in vivo iPS cells are extremely 
similar to ES cells, but present differentially expressed genes that could 
conceivably confer additional properties to the in vivo iPS cells. 


In vivo iPS cells contribute to the trophectoderm 


We noted that the teratomas that appeared in reprogrammable mice 
often presented areas with large cells that resemble trophoblast giant 
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Figure 3 | Isolation and characterization of in vivo iPS cells. a, In vivo iPS 
cell colony 10 days after blood plating. b, Expansion of in vivo iPS cells. 

c, Immunofluorescence of in vivo iPS cell colony. d, Immunoblot of in vivo iPS 
cells, in vitro iPS cells (no. 1 from i4F MEFs; no. 2 from MEFs infected with 
lenti-OSKM) and C57BL/6.10 ES cells. e, Subcutaneous teratoma from in vivo 
iPS cells injection. Asterisk indicates giant trophoblast cells and haemorrhages. 
f, In vivo iPS cells derived chimaera. g, Frequency of in vivo iPS cells isolation. 
h, Unsupervised hierarchical clustering of in vivo iPS cells, in vitro iPS cells and 
ES cells (6, 5 and 3 clones respectively). i, Venn diagram of differentially 
expressed genes. j, qPCR analysis of differentially expressed genes in the same 
clones as in h. Average + s.d. and unpaired two-tailed Student’s t-test are 
shown. *P < 0.05; **P < 0.01; ***P<0.001. 


cells associated with internal haemorrhages, altogether characteristic 
of placental tissue (Fig. 4a; see also examples in Figs 1d and 3e). Indeed, 
this was confirmed by the expression of PL-1 (placental lactogen 1 or 
chorionic somatomammotropin hormone 1) and CK8 (cytokeratin 8 
or KRT8), which are both markers of trophoblast giant cells (Fig. 4a). 
We were intrigued by this observation because trophoblast differenti- 
ation is rare in teratomas produced by ES cells**. To further explore 
this, we subjected ES cells, in vitro iPS cells and in vivo iPS cells to 
culture conditions that favour differentiation into trophoblast stem 
cells (namely, removal of LIF and addition of FGF4 and heparin)”*”*. 
After 5 days, in vivo iPS cells formed abundant colonies with a flattened 
morphology characteristic of trophoblast stem cells. In contrast, in vitro 
iPS and ES cells produced a lower number of colonies and only few of 
them showed trophoblast stem cell like morphology (Fig. 4b). Further- 
more, in vivo iPS cells upregulated markers of trophectoderm lineage 
(Cdx2, Fgfr2 and Eomes) to a larger extent than equally treated in vitro 
iPS or ES cells (Fig. 4c; Extended Data Fig. 8a). The upregulation of Cdx2 
in in vivo iPS cells was confirmed by immunofluorescence (Fig. 4d). Asa 
control, markers of ectoderm (Sox1), mesoderm (T) and endoderm 
(Gata6) lineages showed similar levels of expression among all the cell 
types examined (Extended Data Fig. 8a). Moreover, upon removal of 
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FGF4 and heparin, trophoblast stem cell differentiated in vivo iPS cells 
generated trophoblast giant cells (Fig. 4e). 

Based on the above observations, we tested the capacity of in vivo iPS 
cells to contribute to the trophectoderm. For this, GFP-expressing in 
vivo iPS and ES cells were aggregated or microinjected into morulas 
(either wild type or carrying a Katushka red fluorescent transgene”) 
and the resulting blastocysts were examined. As expected, both cell 
types, in vivo iPS and ES cells, efficiently contributed (100%) to the 
inner cell mass (ICM). Interestingly, in vivo iPS cells also contributed to 
the polar trophectoderm (surrounding the ICM) and to the mural 
trophectoderm (Fig. 4f; Extended Data Fig. 8b). Altogether, in vivo 
iPS cells contributed to the trophectoderm with a remarkable efficiency 
(56%), which was in contrast to ES cells (0%) (Fig. 4g)*°. To test whether 
the in vivo iPS cells actually contribute to the formation of the placenta, 
we examined chimaeric E14.5 embryos generated with GFP-expressing 
in vivo iPS cells and we observed a high degree of chimaerism both in 
the embryo proper as well as in the placenta (Fig. 4h, i; Extended Data 
Fig. 8c). Previous investigators have reported that ES cells and standard 
iPS cells can transitorily access a totipotency-like state similar to the 
2-cell blastomeres (2C state)*!. We wondered whether in vivo iPS cells 
are enriched in the 2C state, however, we could not see upregulation of 
the 2C markers Zscan4, MuERV-L and IAP (Extended Data Fig. 9). 
Therefore, in vivo iPS cells do not seem to be enriched in the 2C state, 
although, as shown above (Fig. 3j), they share transcriptional features 
with morulas. Together, we conclude that in vivo reprogramming con- 
fers a pluripotency state that, in contrast to ES cells or standard in vitro 
iPS cells, can readily access the trophectodermal lineage. 


In vivo iPS cells generate embryo-like structures 


During the course of the above analyses we observed the presence of 
small cysts in the thoracic and abdominal cavities of two reprogram- 
mable mice (froma total of 77 induced reprogrammable mice) (Fig. 5a). 
These cysts were often detached from the surrounding organs and 
from the lining of the abdominal and thoracic cavities (Fig. 5a). The 
cysts were formed by membranous structures and, based on their 
detailed characterization (see below), we refer to them as embryo-like 
structures. We wondered whether in vivo iPS cells could also form 
embryo-like structures when injected intraperitoneally into wild-type 
mice. Remarkably, in addition to the expected teratomas, a fraction of 
mice injected with in vivo iPS cells contained embryo-like structures, in 
contrast to the mice injected with in vitro iPS or ES cells (Fig. 5b). 
Immunohistological analyses of the embryo-like structures indicated 
the presence of cell layers and cellular areas expressing lineage markers 
SOX2 (ectoderm), T/BRACHYURY (mesoderm), or GATA4 (endo- 
derm) (Fig. 5c; Extended Data Fig. 10). In addition, these embryo-like 
structures also expressed CDX2 (Fig. 5c; Extended Data Fig. 10), indi- 
cative of trophectoderm lineage*’, and presented cell layers with the 
typical border morphology of the yolk sac endoderm which co-expressed 
o.-fetoprotein (AFP) and cytokeratin 8 (CK8) (Fig. 5c; Extended Data 
Fig. 10), both characteristic of the visceral endoderm of the yolk sac’’. 
Finally, embryo-like structures presented regions resembling blood 
islands, internally lined by the endothelial cell surface marker LYVE-1 
(ref. 34) and with associated nucleated erythrocytes positive for the 
TER-119 marker (Fig. 5d), highly suggestive of yolk sac associated 
erythropoiesis. We conclude that in vivo iPS cells possess an unpre- 
cedented cell-autonomous capacity to produce embryo-like structures 
containing the three embryonic germ layers together with structures 
reminiscent of the extraembryonic ectoderm and the yolk sac. This 
reinforces the concept that in vivo reprogramming allows the acquisi- 
tion of totipotency features that are absent in ES cells or in standard in 
vitro reprogrammed iPS cells. 


Conclusion 


In this work, we demonstrate that the four factors Oct4, Sox2, KIf4 and 
c-Myc can induce dedifferentiation and pluripotency in a variety of 
cell types in vivo, including cells from the haematopoietic lineage, as 
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Figure 4 | In vivo iPS cells efficiently contribute to the trophectoderm. 

a, Teratomas with trophoblast giant cells. b, Trophoblast stem cell 
differentiation of the indicated cells. ¢, Cdx2 expression in in vivo iPS cells, in 
vitro iPS cells and ES cells (6, 5 and 3 clones, respectively) during TS 
differentiation, relative to day 0. Average + s.d. and unpaired, two-tailed 
Student’s t-test are shown: *P < 0.05, **P < 0.01. d, Immunofluorescence of in 
vivo iPS cells derived trophoblast stem cells. e, Giant cells differentiated from 
trophoblast stem cells and in vivo iPS cells derived trophoblast stem cells. 

f, Chimaeric blastocysts from GFP in vivo iPS cells and GFP-ES cells. 
Arrowheads mark GFP™ trophectoderm cells. g, Frequency of blastocysts with 
GFP* trophectoderm cells from in vivo iPS cells (n = 2 clones) and ES cells 
(JM8.F6). Fisher’s exact test: **P < 0.01. h, GFP in vivo iPS cells chimaeric 
embryo and placenta. i, Immunostaining against GFP. 


well as epithelial cells from the stomach, intestine, pancreas and kid- 
ney. In the context of previous examples of in vivo cellular conver- 
sions*'®, our results notably extend the concept of in vivo plasticity to 
many tissues and to the extreme case of generating embryonic plur- 
ipotent cells, a cell type that is absent in the adult organism. Previous 
investigators have shown that intentionally incomplete in vitro repro- 
gramming with the four factors triggers a dedifferentiated cellular state 
that can have advantageous differentiation properties*’*. In this 
regard, partial or transient activation of the four factors in vivo is an 
attractive approach for regenerative purposes. 

Another important aspect of our work is the recovery of circulating 
iPS cells from the blood of induced reprogrammable mice, which can 
derive from both haematopoietic and non-haematopoietic cells. 
Notably, in vivo iPS and ES cells are extremely similar, and clearly 
separated from in vitro iPS cells. Despite the high similarity between 
in vivo iPS and ES cells, it is possible to detect differentially expressed 
genes among the two cell types. These differences may originate from 
differential epigenetic marks and/or self-sustained transcriptional net- 
works, both of which are prominent regulators of pluripotency””™. In vivo 
iPS cells present a remarkable capacity to undergo trophectoderm lineage 
differentiation, a property that is largely absent in ES cells (or in vitro 
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Figure 5 | In vivoreprogramming and in vivo iPS cells generate embryo-like 
structures. a, Cysts in the abdominal cavity of a reprogrammable mouse. 

b, Frequency of embryo-like structures after intraperitoneal injection of in vivo 
iPS cells (3 clones), in vitro iPS cells (2 clones) and ES cells (JM8.F6). Fisher’s 
exact test: *P < 0.05. ¢, Cyst generated by intraperitoneal injection. Left panels, 
germ layer markers: SOX2 (ectoderm), T/BRACHYURY (mesoderm) and 
GATA4 (endoderm). Right panels, extraembryonic markers: CDX2 
(trophectoderm), and AFP and CK8, both specific for visceral endoderm of the 
yolk sac. d, Cyst generated by intraperitoneal injection presenting TER-119* 
nucleated erythrocytes and LYVE-1 endothelial cells in structures resembling 
yolk sac blood islands. 


iPS cells) under normal culture conditions. Finally, in vivo iPS cells 
have an unprecedented capacity to form embryo-like structures, inclu- 
ding the three germ layers of the proper embryo and extraembryonic 
tissues, such as extraembryonic ectoderm and yolk sac like tissue with 
associated embryonic erythropoiesis. Together, we conclude that in 
vivo iPS cells represent a more primitive or plastic state than ES cells. 
Future work will explore the full capabilities of in vivo iPS cells. 


METHODS SUMMARY 

In vivo iPS cell isolation. Peripheral blood (0.3-0.5 ml) was collected directly 
from the heart of doxycycline-induced i4F mice at the time of necropsy. After 
blood extraction, all procedures were performed in the absence of doxycycline. 
The recovered cells were plated on feeders and cultured in iPS cell medium. 
Generation of embryo-like structures. Wild-type C57BL/6 mice were injected 
with 5 X 10° cells in 100 ull of iPS cell medium. When teratomas were palpable, 
usually around 2 months post-injection, mice were euthanized. 


Online Content Any additional Methods, Extended Data display items and 
Source Data are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 


Received 4 March; accepted 23 August 2013. 


Published online 11 September; corrected online 16 October 2013 (see full-text 
HTML version for details). 


1. Takahashi, K. & Yamanaka, S. Induction of pluripotent stem cells from mouse 
embryonic and adult fibroblast cultures by defined factors. Cel/ 126, 663-676 
(2006). 


©2013 Macmillan Publishers Limited. All rights reserved 


24. 


25. 


26. 


Robinton, D. A. & Daley, G. Q. The promise of induced pluripotent stem cells in 
research and therapy. Nature 481, 295-305 (2012). 

Maherali, N. & Hochedlinger, K. Guidelines and techniques for the generation of 
induced pluripotent stem cells. Cel! Stem Cell 3, 595-605 (2008). 

Cobaleda, C., Jochum, W. & Busslinger, M. Conversion of mature B cells into T cells 
by dedifferentiation to uncommitted progenitors. Nature 449, 473-477 (2007). 
Zhou, Q., Brown, J., Kanarek, A., Rajagopal, J. & Melton, D. A. In vivo reprogramming 
of adult pancreatic exocrine cells to B-cells. Nature 455, 627-632 (2008). 

Qian, L. et al. In vivo reprogramming of murine cardiac fibroblasts into induced 
cardiomyocytes. Nature 485, 593-598 (2012). 

Song, K. et al. Heart repair by reprogramming non-myocytes with cardiac 
transcription factors. Nature 485, 599-604 (2012). 

Banga, A,, Akinci, E., Greder, L. V., Dutton, J. R. & Slack, J. M. In vivo reprogramming 
of Sox9* cells in the liver to insulin-secreting ducts. Proc. Natl Acad. Sci. USA 109, 
15336-15341 (2012). 

Rouaux, C. & Arlotta, P. Direct lineage reprogramming of post-mitotic callosal 
neurons into corticofugal neurons in vivo. Nature Cell Biol. 15, 214-221 (2013). 


. Torper, O. et a/. Generation of induced neurons via direct conversion in vivo. Proc. 


Natl Acad. Sci. USA 110, 7038-7043 (2013). 


. Stadtfeld, M., Maherali, N., Borkent, M. & Hochedlinger, K. A reprogrammable 


mouse strain from gene-targeted embryonic stem cells. Nature Methods 7, 53-55 
(2010). 


. Carey, B. W., Markoulaki, S., Beard, C., Hanna, J. & Jaenisch, R. Single-gene 


transgenic mouse strains for reprogramming adult somatic cells. Nature Methods 
7, 56-59 (2010). 


. Haenebalcke, L. et a/. The ROSA26-iPSC mouse: a conditional, inducible, and 


exchangeable resource for studying cellular (de)differentiation. Cel/ Rep. 3, 
335-341 (2013). 


. Hochedlinger, K., Yamada, Y., Beard, C. & Jaenisch, R. Ectopic expression of Oct-4 


blocks progenitor-cell differentiation and causes dysplasia in epithelial tissues. 
Cell 121, 465-477 (2005). 


. Carey, B. W. et al. Reprogramming of murine and human somatic cells using a 


single polycistronic vector. Proc. Natl Acad. Sci. USA 106, 157-162 (2009). 


. Finch,A.J., Soucek, L., Junttila, M. R., Swigart, L. B.& Evan, G.|. Acute overexpression 


of Myc in intestinal epithelium recapitulates some but not all the changes elicited 
by Wnt/beta-catenin pathway activation. Mol. Cell. Biol. 29, 5306-5315 (2009). 


. Eminli,S. etal. Differentiation stage determines potential of hematopoietic cells for 


reprogramming into induced pluripotent stem cells. Nature Genet. 41, 968-976 
(2009). 


. Yamanaka, S. Elite and stochastic models for induced pluripotent stem cell 


generation. Nature 460, 49-52 (2009). 


. Hanna, J. et al. Human embryonic stem cells with biological and epigenetic 


characteristics similar to those of mouse ESCs. Proc. Nat! Acad. Sci. USA 107, 
9222-9227 (2010). 


. Pettitt, S.J. et al Agouti C57BL/6N embryonic stem cells for mouse genetic 


resources. Nature Methods 6, 493-495 (2009). 


. Hughes, E. D. eta/. Genetic variation in C57BL/6 ES cell lines and genetic instability 


in the Bruce4 C57BL/6 ES cell line. Mamm. Genome 18, 549-558 (2007). 


. Assou, S. etal. Transcriptome analysis during human trophectoderm specification 


suggests new roles of metabolic and epigenetic genes. PLoS ONE 7, e39306 
(2012). 


. Koo, T. B. et al. Differential expression of the PEA3 subfamily of ETS transcription 


factors in the mouse ovary and peri-implantation uterus. Reproduction 129, 
651-657 (2005). 

Yao, X. Q. etal. Glycogen synthase kinase-3f regulates leucine-309 demethylation 
of protein phosphatase-2A via PPMT1 and PME-1. FEBS Lett. 586, 2522-2528 
(2012). 

Glover, C. H. etal. Meta-analysis of differentiating mouse embryonic stem cell gene 
expression kinetics reveals early change of a small gene set. PLOS Comput. Biol. 2, 
e158 (2006). 

Koh, K. P. et al. Tetl and Tet2 regulate 5-hydroxymethylcytosine production and 
cell lineage specification in mouse embryonic stem cells. Cel/ Stem Cell 8, 
200-213 (2011). 


ARTICLE 


27. Lu, C. W. etal. Ras-MAPK signaling promotes trophectoderm formation from 


embryonic stem cells and mouse embryos. Nature Genet. 40, 921-926 (2008). 


28. Ng,R.K. etal. Epigenetic restriction of embryonic cell lineage fate by methylation of 


Elf5. Nature Cell Biol. 10, 1280-1290 (2008). 


29. Diéguez-Hurtado, R. et al. A Cre-reporter transgenic mouse expressing the far-red 


luorescent protein Katushka. Genesis 49, 36-45 (2011). 


30. Beddington, R.S. & Robertson, E. J.An assessment of the developmental potential 


733-737 (1989). 


of embryonic stem cells in the midgestation mouse embryo. Development 105, 


31. Macfarlan, T. S. et al. Embryonic stem cell potency fluctuates with endogenous 
retrovirus activity. Nature 487, 57-63 (2012). 

32. Pfister, S., Steiner, K. A. & Tam, P. P. Gene expression pattern and progression of 
embryogenesis in the immediate post-implantation period of mouse 
development. Gene Expr. Patterns 7, 558-573 (2007). 

33. Conley, B. J., Trounson, A. O. & Mollard, R. Human embryonic stem cells form 
embryoid bodies containing visceral endoderm-like derivatives. Fetal Diagn. Ther. 


19, 218-223 (2004). 


34. Gordon, E. J., Gale, N. W. & Harvey, N. L. Expression of the hyaluronan receptor 
LYVE-1 is not restricted to the lymphatic vasculature; LYVE-1 is also expressed on 
embryonic blood vessels. Dev. Dyn. 237, 1901-1909 (2008). 

35. Thier, M. et a/. Direct conversion of fibroblasts into stably expandable neural stem 


cells. Cel! Stem Cell 10, 473-479 (2012). 
human fibroblasts to angioblast-like progenitor cells. 
Nature Methods 10, 77-83 (2013). 


36. Kurian, L. eta/. Conversion of 


37. Halley, J. D. et al. Self-organizing circuitry and emergent computation in mouse 
embryonic stem cells. Stem Cell Res. 8, 324-333 (2012). 


38. Marks, H. et al. The transcrip 


Supplementary Information is available in the on 


Acknowledgements We are grate 


ional and epigenomic foundations of ground state 
pluripotency. Cell 149, 590-604 (2012). 


ine version of the paper. 


ul to M.Torres for advice, and to K. Hochedlinger and 


R. Jaenisch for reagents. We also thank F. Beier, R. Serrano and N. Soberén for technical 


support. Work in the laboratory o 
Spanish Ministry of Economy (MI 
(ReCaRe), the European Union (R 


MS. is funded by the CNIO and by grants from the 
ECO, SAF), the Regional Government of Madrid 
SK-IR), the European Research Council (ERC 


Advanced Grant), the Botin Foundation, the Ramon Areces Foundation and the AXA 
Foundation. Work in the laboratory of M.M. is funded by grants from the MINECO (BFU), 
the Regional Government of Madrid (Cell-DD) and the ProCNIC Foundation. The 
funders had no role in study design, data collection and analysis, decision to publish, or 


preparation of the manuscript. 


Author Contributions M.A. performed most of the experiments, contributed to 
experimental design, data analysis, discussion and writing; LM. performed a 
substantial amount of experimental work, contributed to experimental design, data 
analysis, discussion and writing; C.P. contributed to experimental work, data analysis, 
discussion and writing; M.C. performed all the histopathological and 
immunohistochemical analyses; T.R. and |.0. contributed to the trophoblast stem cell 
and giant cell differentiation assays; 0.G. analysed the RNAseq data; D. Megias 
supervised and helped with the confocal microscopy; O.D. performed RNAseq and 
determined the lentiviral genomic insertion sites; D. Martinez performed cell sorting 
and contributed to the bone marrow and peripheral blood analyses; M.M. supervised 
trophoblast differentiation assays and gave advice; S.O. generated the transgenic mice, 
constructed chimeras, and perfomed morula and blastocyst assays; M.S. designed and 
supervised the study, secured funding, analysed the data, and wrote the manuscript. All 


authors discussed the results and commented on 


he manuscript. 


Author Information The primary RNA-seq data has been deposited in the GEO 
repository under accession number GSE48364. Reprints and permissions information 
is available at www.nature.com/reprints. The authors declare no competing financial 
interests. Readers are welcome to comment on the online version of the paper. 
Correspondence and requests for materials should be addressed to M.S. 


(mserrano@cnio.es). 


17 OCTOBER 2013 | VOL 502 | NATURE | 345 
©2013 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


METHODS 


Generation of i4F reprogrammable mice. To generate reprogrammable mice, 
we transduced C57BL/6 mouse embryonic fibroblasts (MEFs) carrying a doxycy- 
cline-inducible transcriptional activator (rtTA) within the Rosa26 locus" (gener- 
ously provided by K. Hochedlinger) with a lentivirus carrying a doxycycline- 
inducible tetracistronic cassette with the four murine reprogramming factors 
(Oct4, Sox2, Klf4, c-Myc)'* (Tet-O-FUW-OSKM, obtained from Addgene #20321). 
After lentiviral transduction, MEFs were treated with 1 ug ml‘ of doxycycline and 
colonies of iPS cells appeared after 1 week. Several iPS cells colonies were picked, 
expanded and microinjected into albino C57BL/6J-Tyr*7//J E3.5 blastocysts (5-7 
iPS cells per blastocyst) to obtain chimaeras. Chimaeric mice were backcrossed 
with C57BL6/J mice until the lentiviral transgenes were transmitted at Mendelian 
proportions (indicative of single integration site). The resulting reprogrammable 
i4F mice used in this study are in a pure C57BL6/J genetic background. 
Southern blotting. Genomic DNA (tail tip) was digested overnight with BamHI 
and hybridized with probes designed to recognize exonic sequences of Sox2 and 
Kif4. The probes were generated by PCR of genomic DNA with the following 
primers: Sox2-F: 5’-TACAGCATGATGCAGGAGCA-3’; Sox2-R: 5'-CTGGG 
CCATGTGCAGTCTAC-3’; Kif4-F: 5'-CAGCTTCAGCTATCCGATCC-3’; Kif4-R: 
5'-CGCCTCTTGCTTAATCTTGG-3’. 

Transgene insertion site determination. We performed gene walking as described”. 
Insertion sites were confirmed by PCR using primers against genomic sequences 
around the insertion site (Neto2 in the case of line i4F-A, primer 5'’-GCGTCA 
GGCAATTTATACTCTGG-3’; and Pparg in the case of line i4F-B, primer 5’-CA 
GCATCAAATGGCTCGGTA-3’) and against the lentiviral transgene (5'-GCAC 
CATCCAAAGGTCAGTG-3’). 

Animal procedures. Animal experimentation at the CNIO, Madrid, was per- 
formed according to protocols approved by the CNIO-ISCIII Ethics Committee 
for Research and Animal Welfare (CElyBA). Doxycycline (Sigma) was adminis- 
tered in the drinking water supplemented with 7.5% of sucrose. Experiments were 
performed indistinguishably with mice of both sexes and from 2 to 6 months of 
age. For bone marrow (BM) transplant, groups of 8 wild-type mice (C57BL6/J, 
8-10-weeks old) per donor mouse were irradiated with 12 Gy. The following day, 
the bone marrow of the donor mice was harvested from the femora and tibiae and 
2X 10°-2.5 X 10° cells, suspended in Leibovitz medium (Sigma, L5520), were 
intravenously injected per recipient. Experiments on transplanted mice were 
performed after a latency of at least 30 days to ensure BM reconstitution. For 
the intraperitoneal injections, wild-type mice were injected with 5 X 10° cells 
suspended in 100 tll of iPS cell medium. For subcutaneous teratomas, iPS or ES 
cells were trypsinized and 2 X 10° cells were subcutaneously injected into the 
flanks of immunocompromised nude mice (Swiss nude from Charles River). 
Teratomas were isolated when the diameter reached >1.5cm and processed 
for histological analysis. 

Cell culture. Primary mouse embryonic fibroblasts (MEFs) were obtained from 
embryos at E13.5 and cultured in DMEM supplemented with 10% of FBS and 
penicillin-streptomycin. The following C57BL6 ES cell lines were used: JM8.F6 
(ref. 20), Bruce4 (ref. 21), and CNIO in-house made C57BL6.10. Cultures were 
routinely tested for mycoplasma and were always negative. ES and iPS cells were 
cultured over mitomycin-C inactivated feeder cells on gelatin-coated plates and 
in ‘iPS medium’: high-glucose DMEM supplemented with KSR (15%, Invitrogen), 
LIF (1,000 U ml), non-essential amino acids, penicillin-streptomycin, glutamax 
and B-mercaptoethanol. For lentiviral transduction, we transfected HEK293T 
(5 X 10°) cells with Tet-O-FUW-OSKM (Addgene #20321) and packaging vectors 
using Fugene HD (Roche). Viral supernatants were collected twice a day on two 
consecutive days starting 24h after transfection and were used to infect ROSA26- 
rtTA MEFs, previously plated at a density of 2 X 10° cells per well in 6-well plates. 
Previous to infection, polybrene was added to the viral supernatants at a concen- 
tration of 8 1g ml’. For in vitro idF reprogramming, i4F MEFs were plated at a 
density of 5 X 10° cells per well in 6-well gelatin-coated plates, and at a density of 
3 X 10° for the kinetics assay. Infected MEFs or i4F-MEFs were cultured in iPS cell 
medium with doxycycline (1 4g ml~'). Medium was changed every 48 h until iPS 
cell colonies appeared (after ~7 days of treatment). Reprogramming plates were 
stained for alkaline phosphatase activity (AP detection kit, Sigma-Aldrich). When 
indicated, ES cells or in vivo iPS cells were retrovirally infected with a vector 
expressing GFP (pMSCV-PIG) and infected cells were sorted by FACS. 

In vivo iPS cell isolation. Peripheral blood (0.3-0.5 ml) was collected directly 
from the heart of i4F mice at the time of necropsy, and was subjected to two 
rounds of erythrocyte lysis in ammonium chloride solution (Stem Cells). First 
round of lysis with 10 ml, for 15 min at room temperature, followed by centrifu- 
gation, and a second round of lysis with 3 ml, for 15 min at room temperature, 
followed by neutralization with 12 ml of iPS cell medium. Cells were pelleted and 
counted, recovering ~10° cells per mouse. Cells were resuspended, plated on 
feeders and cultured in iPS cell medium. 


Immunofluorescence. Cells previously seeded in cover slips were fixed in 4% 
paraformaldehyde for 20 min, permeabilized (PBS 0.1% Triton X-100) for 15 min 
and blocked in FBS, for 1h at room temperature. For the detection of OCT4 we 
used two antibodies with similar results, BD 611203, dilution 1:200, and Santa 
Cruz sc-5279, dilution 1:400; for NANOG, Novus NB100 58842, dilution 1:50; 
and for CDX2, Epitomics #2475-1, dilution 1:400. Cells were inspected under a 
Leica SP5 microscope equipped with white light laser and hybrid detection. 
Chimera generation and germline contribution. For chimaera generation, in 
vivo iPS cells (5-7 cells per embryo, ~10 passages) were microinjected into 
C57BL/6J-Tyr°7//J blastocysts and transferred to Crl:CD1 (ICR) pseudopregnant 
females. To study the contribution to germline, GFP-infected in vivo iPS cells 
(~14 passages) were similarly microinjected into blastocysts and the gonads from 
chimaeric male E14.5 embryos were isolated, fixed in 4% paraformaldehyde, and 
analysed for GFP fluorescence in whole mount with laser scanning confocal 
microscope SP5 from Leica, equipped with white light laser and hybrid detection. 
Lens used for imaging were X20 (dry lens) 0.7 numerical aperture and 63 
(water lens) with a 1.2 numerical aperture. 

RNA-seq methods. Total RNA was extracted from ES cells (JM8.F6 (ref. 20), 
Bruce4 (ref. 21), and CNIO in-house made C57BL6.10), in vitro iPS cells (3 from 
i4F-A MEFs and 2 from i4F-B MEFs) and in vivo iPS cells (3 from i4F-A mice and 
3 from i4F-B mice), all with ~10 passages. 1 pg of total RNA, with RIN (RNA 
integrity number) numbers in the range 9.8 to 10 (Agilent 2100 Bioanalyzer), was 
used. PolyA+ fractions were processed using TruSeq Stranded mRNA Sample 
Preparation Kit (Agilent). The resulting directional cDNA libraries were sequenced 
for 40 bases in a single-read format (Genome Analyzer IIx, Illumina). The complete 
set of reads have been deposited in the GEO repository (accession number 
GSE48364). Reads were aligned to the mouse genome (GRCm38/mm10) with 
TopHat-2.0.4 (ref. 40) (using Bowtie 0.12.7 (ref. 41) and Samtools 0.1.16 (ref. 42), 
allowing two mismatches and five multihits. Transcripts assembly, estimation of 
their abundances and differential expression were calculated with Cufflinks 1.3.0 
(ref. 40), using the mouse genome annotation data set GRCm38/mm10 from the 
UCSC Genome Browser. 
Trophectoderm stem cell differentiation. ES and iPS cells (all with ~10 pas- 
sages) were plated on feeders (7 X 10* cells per well in 6-well plates) in iPS cells 
medium and 24h later medium was changed into TS differentiation medium that 
contained the following components: 3 volumes of RPMI 1640 (with 20% FBS, 
1mM pyruvate, 2mM L-glutamine, 100 1M B-mercaptoethanol), 7 volumes of 
conditioned medium from mitomycin-C-inactivated fibroblasts, 25 pg ml’ of 
FGF4 (R&D Systems, 235-F4-025) and 1 pg ml ! of heparin (Sigma, H3149). 
Medium was changed every other day, and all cells were split once at day 2. For 
giant cell differentiation, trophoblast stem cell differentiated in vivo iPS cells and 
established trophoblast stem cells were plated on gelatine and cultured in RPMI 
1640 (with 20% FBS, 1 mM pyruvate, 2 mM L-glutamine, 100 LM B-mercaptoeth- 
anol) in the absence of heparin and FGF4 for 3 days. 

Analysis of trophectoderm lineage contribution. GFP-labelled ES cells or in 
vivo iPS cells (5-7 cells, all with ~14 passages) were aggregated or microinjected 
into 8-cell stage embryos using standard techniques*’. Morulas were incubated 
overnight and observed under the confocal microscope in KSOM (Chemical 
International) microdrops under mineral oil. In some cases Tg-CAG-Katushka 
embryos” were used as recipients. To study the contribution of GFP-in vivo iPS 
cells to the placenta, blastocysts were transferred to pseudopregnant females and 
embryos (E14.5) with their placentas were collected in PBS and observed directly 
under a fluorescence-equipped stereomicroscope or fixed for immunostaining. 
Analysis of mRNA levels. Total RNA was extracted from cell or tissue samples with 
Trizol (Invitrogen), following provider’s recommendations and retrotranscribed 
into cDNA following the manufacturer’s protocol (Maxima First Strand cDNA 
systhesis Kit for RT-qPCR, Fermentas). Quantitative real time-PCR was performed 
using Syber Green Power PCR Master Mix (Applied Biosystems) in an ABI PRISM 
7700 thermocycler (Applied Biosystem). For input normalization, we used the house- 
keeping genes Actb (f-actin) or Gapdh. The primers used were: Actb forward primer: 
5'-GGCACCACACCTTCTACAATG-3’, Actb reverse primer: 5’-GTGGTGGT 
GAAGCTGTAGCC-3’; Cene forward primer: 5'-GTGGCTCCGACCTTTCAG 
TC-3’, Ccne reverse primer 5'-CACAGTCTTGTCAATCTTGGCA-3’; Cdx2 for- 
ward primer 5’-CAAGGACGTGAGCATGTATCC-3’, Cdx2 reverse primer 5'- 
GTAACCACCGTAGTCCGGGTA-3’; E2A-c-Myc forward primer, 5'-GGCTG 
GAGATGTTGAGAGCAA-3’, E2A-c-Myc reverse primer 5'-AAAGGAAATCC 
AGTGGCGC-3’; Eomes forward primer 5'-TTCACCTTCTCAGAGACACAGT 
TCAT-3', Eomes reverse primer 5'-GAGTTAACCTGTCATTTTCTGAAGCC-3’; 
Epcam forward primer 5'-GCGGCTCAGAGAGACTGTG-3’, Epcam reverse primer 
5'-CCAAGCATTTAGACGCCAGTTT-3’; Etv4 forward primer, 5'-TGGTGAT 
CAAACAGGAGCG-3', Etv4 reverse primer, 5’-GGGTGGAGGTACATTGA 
TGC-3'; Fefr2 forward primer, 5'-GAGGAATACTTGGATCTCACC-3’, Fefr2 
reverse primer 5'-CTGGTGCTGTCCTGTTTGGG-3’; Gapdh forward primer 
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5'-TTCACCACCATGGAGAAGGC-3’, Gapdh reverse primer 5'-CCCTTTTGG 
CTCCACCCT-3’; Gata6 forward primer 5’-TCATTACCTGTGCAATGCATGC 
GG-3', Gata6 reverse primer 5'-ACGCCATAAGGTAGTGGTTGTGGT-3’; Gbx2 
forward primer 5’-CAACTTCGACAAAGCCGAGG-3’, Gbx2 reverse primer 5'- 
ACTCGTCTTTCCCTTGCCCT-3’; IAP forward primer 5'-CAGACTGGGA 
GGAAGAAGCA-3’, IAP reverse primer 5’-ATTGTTCCCTCACTGGCAAA-3’; 
Lin28a forward primer 5'-GAAGAACATGCAGAAGCGAAGA-3’, Lin28a reverse 
primer 5'-CCGCAGTTGTAGCACCTGTCT-3’; Mmp12 forward primer 5'-CT 
GCTCCCATGAATGACAGTG-3', Mmp12 reverse primer 5'-AGTTGCTTCTA 
GCCCAAAGAAC-3’; MuERV-L forward primer 5'-CCCATCATGAGCTGGGT 
ACT-3', MuERV-L reverse primer 5'-CGTGCAGAGCCATCAGTAAA-3'; Nanog 
forward primer 5'-CAAGGGTCTGCTACTGAGATGCTCTG-3’, Nanog reverse 
primer 5'-TTTTGTTTGGGACTGGTAGAAGAATCAG-3'; Neto2 forward 
primer 5’-GTCGTGGAAGGGATTGCTGT-3’, Neto2 reverse primer 5'-AAGCA 
AAATGACCTCCATTGC-3’; Nirp4 forward primer 5’-TGTCCTGAATGAAG 
GAGACCA-3', Nirp4 reverse primer 5’-TTACTCCTTACAAACACAGAGCA 
CA-3'; Oct4 (total) forward primer 5’-GTTGGAGAAGGTGGAACCAA-3’, Oct4 
(total) reverse primer 5’-CCAAGGTGATCCTCTTCTGC-3’; Pparg forward pri- 
mer 5'-GGCCGAGAAGGAGAAGCTGTTG-3’, Pparg reverse primer 5'-TGG 
CCACCTCTTTGCTCTGCTC-3’; Ppmlj forward primer 5'-AGAGCAGGCAC 
AATGAGGAT-3’, Ppml1j reverse primer 5'-CATCAAACAGGCCCCAGTAG- 
3'; Sox1 forward primer, 5'-TGAACGCCTTCATGGTGTGGTC-3’, Sox1 reverse 
primer 5'-GCGCGGCCGGTACTTGTAAT-3’; Sox2 forward primer 5’-CGTA 
AGATGGCCCAGGAGAA-3’, Sox2 reverse primer 5’-GCTTCTCGGTCTCGG 
ACAAA-3’; Sox2-Kif4 forward primer 5’-ACTGCCCCTGTCGCACAT-3’, Sox2- 
Klf4 reverse primer 5'-CATGTCAGACTCGCCAGGTG-3’; T forward primer 5'- 
GCTTCAAGGAGCTAACTAACGAG-3’, T reverse primer 5'-CCAGCAAGAA 
AGAGTACATGGC-3'; Tgm1 forward primer 5'-CAGATCTGCCCTCAGGC 
TT-3’, Tgm1 reverse primer 5'-CCATTCTTGACGGACTCCAC-3’; Tnc forward 
primer 5'-ACCATGGGTACAGGCTGTTG-3’, Tnc reverse primer 5’-CCTT 
CTGCACTGAAGTTGCC-3’; Uéfl forward primer 5'-TGTCCCGGTGACTAC 
GTCT-3’, Utfl reverse primer 5'-CCCAGAAGTAGCTCCGTCTCT-3’; Zscan4 
forward primer 5’-GAGATTCATGGAGAGTCTGACTGATGAGTG-3’, Zscan4 
reverse primer 5’-GCTGTTGTTTCAAAAGCTTGATGACTTC-3’; 8430410A 17Rik 
forward primer 5'-TGGATTCTACGAGTGGCAGC-3’, 8430410A17Rik reverse 
primer 5’-CTGTCTGAAGCATCGTTCCC-3’. 

Protein analysis. Tissue samples (50-100 mg) were homogenized in medium- 
salt lysis buffer (150 mM NaCl, 50 mM Tris pH 8, 1% NP40 and protein inhibitors 
cocktail) using Precellys 24 homogenizer. A total protein amount of 30 ug was 
loaded per lane in a NuPAGE 4-12% Bis-Tris gel 1.0 mm (Invitrogen) and elec- 
trophoresed in MES SDS Running Buffer (Invitrogen). The following antibodies 
were used: for OCT4, Santa Cruz Biotechnology sc-9081, 1:500; for NANOG, 
Millipore AB 5731, 1:5,000; for SOX2, Santa Cruz sc-17320, 1:500; and for actin, 
Sigma-Aldrich AC-15, 1:5,000. 
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Immunohistochemistry. Tissue samples were fixed in 10% formaline, paraffine- 
embedded and cut in 3-m sections, which were mounted in superfrostplus porta- 
objects and re-hydrated. For immunohistochemistry, paraffine sections underwent 
antigenic exposure process into the Discovery XT (Roche) system with CC1 buffer 
for standard antigen retrieval. The following primary antibodies were used: for 
NANOG, Cell Signalling Biotechnology, 8822; for cytokeratin 19 (CK19), CNIO 
Monoclonal Antibodies Core Unit, AM-TROMA III; for placental lactogen 1 (PL-1), 
Santa Cruz Biotechnology, sc34713; for cytokeratin 8 (CK8), CNIO Monoclonal 
Antibodies Core Unit, AM-TROMA I; for GFP, Roche, 11814460001; for SOX2, 
Cell Signaling Technology, 3728; for T/BRACHYURY, Santa Cruz Biotechnology, 
scl7743; for GATA4, Santa Cruz Biotechnology, sc1237; for CDX2, Biogenex, 
MU392A-UC; for «-fetoprotein (AFP), R&D Systems, AF5369; and for OCT4, 
Santa Cruz Biotechnology, sc-9081. Slides were then incubated with the corres- 
ponding secondary antibodies conjugated with peroxidase from Dako. 
Statistical methods. Sample sizes for comparisons between cell types or between 
mouse genotypes followed Mead’s recommendations™. In particular, the accumu- 
lated n value (N) for a given comparison minus the number of groups or treat- 
ments (T) (for example, genotypes) was between 10 and 20, as recommended. 
Samples (cells or mice) were allocated to their experimental groups according to 
their pre-determined type (cell type or mouse genotype) and therefore there was 
no randomization. Investigators were not blinded to the experimental groups (cell 
types or mouse genotypes). In the case of Fig. le, no mice were censored. In the 
case of Fig. 2a, b, only mice that died with teratomas were considered, as indicated 
in the ordinate axes; mice that died due to other complications were censored and 
indicated with ticks in the Kaplan-Meier curves. Quantitative PCR data were 
obtained from independent biological replicates (n values indicated in the corres- 
ponding figure legends) and were tested for normal distribution using the 
Shapiro-Wilk test and for equal variance using the F-test. Normal distribution 
and equal variance was confirmed in the large majority of data and, therefore, we 
assumed normality and equal variance for all samples. Based on this, we used the 
Student’s t-test (two-tailed, unpaired) to estimate statistical significance. For con- 
tingency tables, we used the Fisher’s exact test. 
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transgenic mouse lines. a, Southern blot of tail tip genomic DNA digested with 
BamHI and hybridized with specific probes for Sox2 and Kif4. b, Mice of the 
indicated transgenic lines carrying the reprogramming transgene (+) or 
without it (—) were treated with doxycycline (1 mg ml _') for 6 days. The 
mRNA levels of Oct4 were determined by qRT-PCR. Values correspond to the 
average and s.d. (m = 3 mice per transgenic line) and are relative to the levels of 
wild-type mice treated with doxycycline. c, MEFs of the indicated mouse lines 
were treated with doxycyline (1 jig ml"). Colonies of iPS cells in the i4F-A and 
i4F-B plates were stained for alkaline phosphatase (AP) 10 days after induction. 
In the case of i4F-C and i4F-D, plates were stained after 15 days but no iPS cell 
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colonies were observed. In parallel, total Oct4 mRNA levels were measured at 
the indicated times by qRT-PCR. Values correspond to the average and s.d. For 
i4F-A, i4F-B, and i4F-D, n = 3 MEF preparations; for i4F-C, n= 1. 

d, Comparison of the in vitro reprogramming kinetics and efficiency of MEFs 
from lines i4F-A and i4F-B. Reprogramming was induced with two different 
protocols: 1 1g ml! of doxycycline for 6 days, or continuous treatment with 
lpg ml! of doxycycline. AP™ colonies were counted at the indicated times. 
Values correspond to the average and s.d. (n = 3 independent MEF isolates per 
line). In b and ¢, statistical significance was evaluated by the Student’s t-test 
(unpaired, two-tailed): *P < 0.05, **P < 0.01, ***P < 0.001. 
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Lentiviral insertion in i4F-A: Neto2 gene 
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Extended Data Figure 2 | Genomic insertion sites of lentiviral transgenes 

i4F-A and i4F-B and their effect on the host genes. a, Primers used for PCR to 
confirm insertion are shown in blue and underlined. These primers were used 
together with a common primer hybridizing to internal lentiviral sequences 

(see Methods). The 4 base pairs flanking the insertion site are duplicated upon 
lentiviral insertion and are underlined. A map of each gene is shown indicating 
with an arrow the approximate location of the lentiviral transgene. The pictures 
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Lentiviral insertion in i4F-B: Pparg gene 
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CAAGGCTCTGGGAATGAGATTTTCTGATGCCCAAGCTTTGCTGGACAAAGAATGATCCTTC 
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of PCR agarose gels correspond to the PCR products obtained with the flanking 
primer (underlined sequence in blue) and the internal lentiviral primer (not 
shown) (see Methods). b, The indicated tissues were used to measure the levels 
of Neto2 (host gene for the lentiviral transgene i4F-A) or Pparg (host gene for 
the lentiviral transgene i4F-B). Values correspond to the average and s.d. (n = 3 
mice per condition). Statistical significance was evaluated by Student’s t-test 
(unpaired, two-tailed). No significant differences were observed. 
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Extended Data Figure 3 | Histological alterations of the intestine and staining and inmunohistochemistry of OCT4 in the intestine (a) and pancreas 
pancreas upon induction of i4F reprogrammable mice. Mice were treated _—_(b). Similar alterations were found in both lines, i4F-A and i4F-B. 
with doxycycline (1 mg ml’) for 6 days. Haematoxylin and eosin (H&E) 
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Extended Data Figure 4 | i4F induction leads to the appearance of tumoral _ is shown from 15 mice analysed with teratomas). b, Incidence of other tumours 
masses and in situ reprogramming events. a, Reprogrammable mouse in reprogrammable mice with teratomas. c, Three examples of NANOG- 
with multiple tumoral masses in the liver and kidneys (a representativeexample _ positive tubules in different induced reprogrammable mice. 
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Extended Data Figure 5 | Characterization of in vivo iPS cells. a, Expression 
of pluripotency markers in the indicated cell types. Data correspond to qRT- 
PCR from seven independent in vivo iPS cell clones, two in vitro iPS cell clones 
(no. 1: in vitro reprogrammed i4F MEFs; no. 2: in vitro reprogrammed wild- 
type MEFs infected with lenti-OSKM), and two ES cell clones (no. 1: 
C57BL6.10; no. 2: G4). Values correspond to the average + s.d. of 3 technical 
replicates. b, Silencing of the lentiviral cassette in in vivo iPS cell clones. Upper 
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part, location of the PCR primers used. Lower part, lentiviral RNA levels in in 
vivo iPS cells (7 independent clones), in an in vitro iPS cell clone (in vitro 
reprogrammed i4F MEFs), in an ES cell line (C57BL6.10), and in i4F-MEFs 
induced with doxycycline for 3 days. Values correspond to the average + s.d. of 
3 technical replicates. c, Chimaeric E14.5 testis generated with a GFP-labelled in 
vivo iPS cells. Magnifications show germ cells derived from in vivo iPS cells. 
d, Summary of the isolation of in vivo iPS cells from the bloodstream. 
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sequenced samples. The highest and the lowest coefficients are coloured in a blue __ types. Significant P values are in blue (that is, indicating differentially expressed 
to red gradient. b, Principal component analysis of the transcriptomes of in vivo _ genes). Non-significant P values are in red (that is, indicating genes that are not 
iPS cells, in vitro iPS cells and ES cells. Data correspond to 6 clones of in vivo iPS __ differentially expressed). Lower part, Pearson coefficient correlation among 
cells, 5 clones of in vitro iPS cells, and 3 lines of ES cells (C57BL6.10, JM8.F6 and samples. Data correspond to 6 clones of in vivo iPS cells, 5 clones of in vitro iPS 
Bruce4). c, Upper part, scatter plots representing the expression of each gene in _ cells, and 3 lines of ES cells (C57BL6.10, JM8.F6 and Bruce4). 
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Extended Data Figure 7 | Validation of RNA-seq data. a,Genes upregulated derived from a preparation of ~170 morulas was also included in b. Values 
in in vivo iPS and ES cells versus in vitro iPS cells. b, Genes upregulated in in _ correspond to the average + s.d. Statistical significance was evaluated relative to 
vivo iPS cells versus ES cells and in vitro iPS cells. Expression levels of the in vitro iPS cells (a) or relative to in vivo iPS cells (b) by the Student's t-test 
indicated genes in in vivo iPS cells (n = 6 clones), in vitro iPS cells(n =5 clones) (unpaired, two-tailed): *P < 0.05, **P < 0.01, ***P < 0.001. 

and ES cells (n = 3 lines C57BL6.10, JM8.F6 and Bruce4). A sample of RNA 
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Extended Data Figure 8 | In vivo iPS cell contribution to the trophectoderm 
lineage. a, Induction of trophectoderm markers (Fgfr2, Eomes) in the indicated 
cell types after culture in TS differentiation medium (see Methods) during 
the indicated period of time. Other markers were used as controls: Sox1 
(ectoderm), T (mesoderm) and Gata6 (endoderm). For each cell type, values 
are relative to the average levels at day 0. Values correspond to the average and 
s.d. For ES cells, n = 3 (lines C57BL6.10, JM8.F6 and Bruce4); for in vitro iPS 
cells, n = 5 clones; and for in vivo iPS cells, n = 5 clones. Statistical significance 
was determined using the Student’s t-test (unpaired, two-tailed): *P < 0.05, 
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**P < 0.01. The lower line of asterisks refers to the comparison with in vitro iPS 
cells, and the upper line of asterisks to the comparison with ES cells. b, Example 
of a chimaeric blastocyst derived from a Katushka morula injected with GFP- 
labelled in vivo iPS cells. Two different confocal planes are shown containing 

GFP-labelled cells that have contributed to the trophectoderm and to the inner 
cell mass, as indicated. c, Chimaerism of GFP-labelled in vivo iPS cells in the 

proper embryo and placenta (E14.5). A wild-type embryo at the same stage of 
development is shown as a control. Fluorescence pictures were taken with the 
same settings. 
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Extended Data Figure 9 | Expression levels of 2C marker genes. Analysis of 
the expression of genes enriched in the 2C state: the retrotrasposable elements 
MuERV-L, Zscan4, and intracisternal A particles (IAP) showed no differences 
between in vivo iPS cells compared to ES cells and in vitro iPS cells. For ES cells, 
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n = 3 (lines C57BL6.10, JM8.F6 and Bruce4); for in vitro iPS cells, n = 5 clones; 
and for in vivo iPS cells, n = 6 clones. Values correspond to the average and s.d. 
Statistical significance was determined using the Student’s t-test (unpaired, 
two-tailed). None of the differences was statistically significant. 
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Extended Data Figure 10 | Immunohistochemical characterization of (ectoderm), T/BRACHYURY (mesoderm), GATA4 (endoderm), CDX2 
embryo-like structures. Haematoxylin and eosin and immunostaining (trophectoderm), AFP and CK8 (visceral endoderm of the yolk sac). All lateral 
analysis of two examples of embryo-like structures generated upon in vivo iPS __ panels are at the same magnification. 

cells intraperitoneal injection. The following markers were used: SOX2 
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Slowly fading super-luminous supernovae that are 
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Super-luminous supernovae! that radiate more than 10“ ergs per 
second at their peak luminosity have recently been discovered in 
faint galaxies at redshifts of 0.1-4. Some evolve slowly, resembling 
models of ‘pair-instability’ supernovae”*. Such models involve stars 
with original masses 140-260 times that of the Sun that now have 
carbon-oxygen cores of 65-130 solar masses. In these stars, the photons 
that prevent gravitational collapse are converted to electron-positron 
pairs, causing rapid contraction and thermonuclear explosions. Many 
solar masses of *°Ni are synthesized; this isotope decays to *°Fe via *°Co, 
powering bright light curves”*. Such massive progenitors are expected 
to have formed from metal-poor gas in the early Universe’. Recently, 
supernova 2007bi in a galaxy at redshift 0.127 (about 12 billion years 
after the Big Bang) with a metallicity one-third that of the Sun was 
observed to look like a fading pair-instability supernova’"®. Here we 
report observations of two slow-to-fade super-luminous supernovae 
that show relatively fast rise times and blue colours, which are incom- 
patible with pair-instability models. Their late-time light-curve and 
spectral similarities to supernova 2007bi call the nature of that event 
into question. Our early spectra closely resemble typical fast-declining 
super-luminous supernovae”'*”’, which are not powered by radio- 
activity. Modelling our observations with 10-16 solar masses of 
magnetar-energized'*"* ejecta demonstrates the possibility of acommon 
explosion mechanism. The lack of unambiguous nearby pair-instability 
events suggests that their local rate of occurrence is less than 6 X 107° 
times that of the core-collapse rate. 

The discovery of a luminous transient, PTF 12dam, was first reported’° 
by the Palomar Transient Factory on 23 May 2012. We recovered the 
transient in Pan-STARRS1 (Panoramic Survey Telescope and Rapid 
Response System) 37 survey data, between 13 and 29 April 2012, at right 
ascension (RA) 14h 24 min 46.21 s and declination (dec.) +46° 13’ 48.66”. 
We triggered spectroscopic follow-up, beginning with Gran Telescopio 
Canarias and the William Herschel Telescope (23-25 May 2012). No 
traces of hydrogen or helium were visible, leading to a type Ic classi- 
fication, and strong host galaxy lines provided a redshift measurement 
z=0.107 (ref. 15). A second, similar transient, PS1-1lap, was disco- 
vered in the Pan-STARRS1 Medium Deep Survey on 2 January 2011 
(RA 10h 48 min 27.72 s, dec. +57° 09’ 09.2”). Early spectra showed host 


galaxy emission lines at z= 0.523 (for details of the data, see Sup- 
plementary Information sections 1-3). 

The high luminosity and slow decline of their light curves (Fig. 1, 
Extended Data Tables 1-3, Extended Data Fig. 1) marked out PTF 
12dam and PS1-11ap as potential SN 2007bi-like events: that is, they 
could be pair-instability supernova (PISN) candidates discovered soon 
after explosion. SN 2007bi was discovered well after maximum light. 
Although the peak was recovered in the R band’, the light-curve rise 
and early spectra were missed. Because of the long diffusion timescale 
associated with the very massive ejecta in PISN models, the time to reach 
maximum light (2 100 days) is a crucial observational test. The rise 
time for SN 2007bi was estimated at 77 days (ref. 1), but this was based 
on a parabolic fit to the data around the peak, and so was not well 
constrained. Our Pan-STARRS1 images reveal multiple early detec- 
tions of PTF 12dam and PS1-1 lap in gp), rp; and ip; bands at around 50 
and 35 rest-frame days before peak brightness, respectively (Extended 
Data Fig. 2). PIF 12dam is not detected in zp, images on 1 January 
2012, 132 days before the peak. Although their light curves match the 
declining phases of SN 2007bi and the PISN models quite well, PTF 
12dam and PS1-11ap rise to maximum light a factor of ~2 faster than 
these models. 

The spectra of PIF 12dam and PS1-11ap show them to be similar 
supernovae. After 50 days from the respective light curve peaks, these 
spectra are almost identical to that of SN 2007bi at the same epoch 
(Fig. 2, Extended Data Table 4, Extended Data Fig. 3). The blue colours 
are in stark contrast to the predictions of PISN models’® (Fig. 3, Extended 
Data Fig. 4), which show much cooler continua below 5,000 A and 
marked drop-offs in the ultraviolet. Particularly around and after maxi- 
mum light, PISN colours are expected to evolve to the red owing to 
increasing blanketing by iron group elements”* abundant in their ejecta. 
We see no evidence of line blanketing in our spectra, even down to 
2,000 A (rest frame) in PS1-1lap, which suggests lower iron group 
abundances and a higher degree of ionization than in PISN models. 
Such conditions are fulfilled in models of ejecta reheated by magnetars— 
highly magnetic, rapidly rotating nascent pulsars'*’*’”. The pressure of 
the magnetar wind on the inner ejecta can form a dense shell’*"*"” at 
near-constant photospheric velocity. For PTF 12dam, the velocities of 
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Figure 1 | Optical light curves of slow-fading super-luminous supernovae. 
Data for PTF 12dam (including discovery data announced’* by PTF) and SN 
2007bi (from refs 1 and 10) are given in the SDSS (Sloan Digital Sky Survey) 
r band (central wavelength A. = 6,230 A), while for PS1- llap at z= 0.523, the 
PS1 Zp, filter corresponds to a rest-frame filter of 2. = 5,680 A (width of 
passband Awiath 1,350 A), similar to SDSS r. The first three PS1-1lap points 
were transformed from ip, using the observed ip, — Zp; colour (see 
Supplementary Information sections 2 and 3 for details of the data, including 
k-corrections, colour transformations and extinction). The three supernovae 
(open symbols) display the same slow decline from maximum, matching the 
rate expected from *°Co decay (dashed line) with close to full y-ray trapping 
(although similar declines can be generated for ~100 days after peak from 
magnetar spin-down”’). Powering these high luminosities radioactively 
requires at least 3-7M © of °°Ni (refs 1, 10, 18 and 20), suggesting an extremely 
massive progenitor and possible pair-instability explosion’. Also shown are 
synthetic SDSS r-band light curves (solid lines) generated from published 
one-dimensional models”* of PISNs from 100-130M q stripped helium cores. 
These fit the decline phase well, but do not match our early observations. The 
rise time of a PISN is necessarily long (rising 2.5 mag to peak in 95-130 days), 
because heating from ~°Ni/°°Co decay occurs in the inner regions, and the 
resultant radiation must then diffuse through the outer ejecta, which typically 
has mass >80M 5 (ref. 7). Models with higher-dimensional outward mixing of 
*°Ni are likely to show even shallower gradients in the rising phase, while as-yet 
unexplored parameters such as rotation and magnetic fields will have little 
effect on the diffusion timescale, which is set by the mass, kinetic energy and 
opacity of the ejecta (see Supplementary Information section 5.2). The pre-peak 
photometry of PTF 12dam and PS1-1lap shows only a moderately slow rise 
over 50-60 days, which is therefore physically inconsistent with the PISN 
models. Error bars, +1o. 


spectral lines are close to 10,000 kms" at all times. Intriguingly, the 
early spectra of our objects are very similar to those of superluminous 
supernovae of type I (refs 2, 11, 12) and evolve in the same way, but on 
longer timescales and with lower line velocities (Fig. 2). 

Nebular modelling of SN 2007bi spectra has been used to argue’ for large 
ejected oxygen and magnesium masses of 8-15Mj and 0.07-0.13M., 
respectively (where M.q is the solar mass). Such masses are actually 
closer to values in massive core-collapse models"* than in PISN models, 
which eject ~40M. oxygen and ~4M. magnesium’. In the work 
reported in ref. 1, an additional 37M q in total of Ne, Si, S, and Ar were 
added to the model, providing a total ejecta mass consistent with a 
PISN. However, this was not directly measured!, because these elements 
lack any identified lines. These constraints are important, so we investi- 
gated line formation in this phase using our own non-local thermodynamic 
equilibrium code’? (Extended Data Fig. 5; Supplementary Information 
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Figure 2 | Spectral evolution of PTF 12dam and PS1-11ap from super- 
luminous supernovae of type I to SN 2007bi-like. a—e, We show spectra of 
PTF 12dam, PS1-1lap, SN 2007bi, and the well-studied superluminous 
supernovae of type I, SN 2010gx"’, SN 2005ap* and PTF09cnd?. Our spectra 
have been corrected for extinction and shifted to respective rest frames 
(details of reduction and analysis, including construction of model host 
continua for subtraction from d and e, in Supplementary Information section 
3), and scaled to facilitate comparison. Phases are given in rest-frame days 
relative to maximum light. No hydrogen or helium are detected at any stage 
(near-infrared spectra of PTF 12dam, obtained at +13 days and +27 days, also 
show no He}; see Supplementary Information section 3). a, Before and around 
peak, our objects show the characteristic blue continua and O 11 absorptions 
common to super-luminous supernovae of type I/Ic”'*”", although the lines in 
the slowly evolving objects are at lower velocities than are typically seen in those 
events. b, Shortly after peak, Fe m1 features emerge, along with the Mg 1! and Call 
lines that dominate superluminous type I supernovae at this phase. c, By 

55 days after peak, PTF 12dam is almost identical to SN 2007bi. We note that 
these objects still closely resemble SN 2010gx, but seem to be evolving on longer 
timescales (consistent with the slower light-curve evolution). d, At ~100 days, 
PTF 12dam also matches PTF09cnd’, which faded slowly for a superluminous 
type I supernova after a 50-day rise. e, The spectra are now quasi-nebular, 
dominated by emission lines of Ca H and K, Mg] 4,571 A, Mg1 5,183 A + 
[Fe 115,200] A blend, [O 1] 6,300, 6,364 A, [Ca] 7,291, 7,323 A, and 017,774 A, 
but some continuum flux is still visible. We find that the emission line 
intensities can be reproduced by ejecta from a 15Mo type I supernova at a few 
thousand degrees, without requiring a large mass of iron (Supplementary 
Information section 4). 


section 4). We found that the luminosities of [O 1] 6,300, 6, 364 A, O1 
7,774 A and Mg}] 4,571 A, and the feature at 5,200 A ([Feu] + Mgn), 

can be reproduced with 10-20M © of oxygen-dominated ejecta, contai- 
ning ~0.001-1Mo5 ofiron, given reasonable physical conditions (singly 
ionized ejecta at a few thousand degrees). Thus, although the nebular 
modelling of SN 2007bi in ref. 1 provided a self-consistent solution for 
PISN ejecta, our calculations indicate that this solution is not unique, 
and has not ruled out lower-mass ejecta on the core-collapse scale 
(10M 5). Moreover, if the line at 5,200 Ais [Fe 1], then both our model 
and the model of ref. 1 predict a dominant [Fe 11] 7,155 A line (at the 
low temperatures and high iron mass expected in PISN), which is not 
present in the observed spectra. To estimate the nickel mass needed to 
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Figure 3 | Spectral comparison with pair-instability and magnetar-driven 
supernova models. a-c, We compare our ultraviolet and optical data to the 
predictions of PISN’*'* and magnetar models’* (lines in models are identified 
in refs 8 and 13). The absence of narrow lines and hydrogen/helium seems to 
make interaction-powered colliding-shell models unlikely (for example, the 
pulsational pair-instability; see Supplementary Information section 5.3). Model 
spectra are matched to the observed flux in the region 5,500-7,000 A. a, We 
compare PS1-1lap to a Wolf-Rayet progenitor magnetar model (pm1p0"’) at 
peak light (model spectra at later epochs do not currently exist in the literature). 
The magnetar energy input is equivalent to several solar masses of *°Ni, in ejecta 
of only 6.94M >. The high internal-energy-to-ejecta-mass ratio keeps the ejecta 
hot and relatively highly ionized, resulting in a blue continuum to match our 
observations. Moreover, this energy source does not demand the high mass of 
metals intrinsic to the PISN scenario”*. Redward of the Mg u line at 2,800 A, this 
model shows many of the same Fe 1 and Ol lines dominating the observed 
spectra, although the strengths of the predicted Sim and Cm lines in the 
near-ultraviolet are greater than those observed in PS1-1lap. We also compare 
PTF 12dam at peak to a 130M. He core PISN model’. The model spectrum has 
intrinsically red colours below 5,000 A owing to many overlapping lines from 
the large mass of iron-group elements and intermediate-mass elements. Our 
rest-frame ultraviolet spectra of PS1-1lap, and ultraviolet photometry of 
PTF 12dam, show that the expected line blanketing/absorption is not observed. 
b, PTF 12dam compared to models of 125-130M« PISNs”* at 55 days. 
Although the observed spectrum has cooled, the models still greatly 
under-predict the flux blueward of 5,000 A. c, PS1-1lap, at 78 days, compared 
to 100-130M.~ PISN models”* at similar epochs. Again, our observations are 
much bluer than PISN models. In particular, PS1-11ap probes the flux below 
3,000 A, where we see the greatest discrepancy. 


power PTF 12dam radioactively, we constructed a bolometric light 
curve from our near-ultraviolet to near-infrared photometry (Fig. 4). 
PTF 12dam is brighter than SN 2007bi, and fitting it with radioactively 
powered diffusion models!*”° requires ~15M 5 of °Niin ~15-50Mo 
of ejecta — combinations that are not produced in any physical model 
(Extended Data Fig. 6; Supplementary Information section 5.1). Further- 
more, such large nickel fractions are clearly not supported by our spectra. 

The combination of relatively fast rises and blue spectra, lacking ultra- 
violet line blanketing, shows that PTF 12dam, PS1-11ap and probably SN 
2007bi are not pair-instability explosions. We suggest here one model 
that can consistently explain the data. A magnetar-powered supernova 
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Figure 4 | Bolometric light curve and magnetar fit. Our PTF 12dam 
bolometric light curve (open circles), comprising Swift observations in the 
near-ultraviolet, extensive griz imaging, and multi-epoch near-infrared (JHK) 
data (Supplementary Information section 5), is well fitted by our semi-analytic 
magnetar model’ (black line) (see Supplementary Information section 5.4). 
This model, with magnetic field B ~ 10'*Gand spin period P ~ 2.6 ms, can fit 
both the rise and decay times of the light curve. A large ejecta mass 

of ~10-16M q is required—significantly higher than typically found for type 
Ibc supernovae”, but similar to the highest estimates for SN 2011bm” and SN 
2003lw*? (though well below the >80M. expected in PISNs). In the context 
of the magnetar model, the parameters of our fit are consistent with the 
observed spectroscopic relation to super-luminous supernovae of type I. Fits to 
a sample of such objects using the same model” found uniformly lower ejected 
masses and higher magnetic fields than in PTF 12dam. The large ejecta 

mass here results in a slow light-curve rise and broad peak compared to other 
super-luminous supernovae of type Ic***!, and would explain the slower 
spectroscopic evolution, including why the spectrum is not fully nebular at 
200 days. The weaker B field means that the magnetar radiates away its 
rotational energy less rapidly, so that more of the heating takes place at later 
times; this gives the impression of a radioactive tail. Higher ejected mass and 
weaker magnetar wind may account for the lower velocities in slowly declining 
events. Also shown for comparison are bolometric light curves of model 
PISNs’* from 80-130M = He cores (coloured lines). Although PISNs from less 
massive progenitors do show faster rise times, the rise of PTF 12dam is too 
steep to be consistent with the PISN explosion of a He core that is sufficiently 
massive to generate its observed luminosity. Errors bars, +1o photometry, 
combined in quadrature. 


can produce a light curve with the observed rise and decline rates as 
the neutron star spins down and reheats the ejecta’*'*'*"”. It has been 
suggested that ~ 10% of core-collapses may form magnetars"*. Although 
their initial-spin distribution is unknown, periods 2 1 ms are physically 
plausible. This mechanism has already been proposed for SN 2007bi"*, as 
well as for fast-declining superluminous type-I supernovae*”’. We 
fitted a magnetar-powered diffusion model”! to the bolometric light 
curve of PTF 12dam (Fig. 4), and found a good fit for magnetic field 
B= 10'* Gand spin period P ~ 2.6 ms, with an ejecta mass of ~10-16M 5. 
At peak, the r-band luminosities of PTF 12dam and PS1-1lap are ~1.5 
times that of SN 2007bi. Scaling our light curve by this factor, our 
model implies a similar ejected mass for SN 2007bi, with a slower- 
spinning magnetar (P ~ 3.3 ms), comparable to previous models”. 
If the magnetar theory is correct for normal superluminous type-I 
supernovae’, our objects could be explained as a subset in which 
larger ejected masses and weaker magnetic fields result in slower pho- 
tometric and spectroscopic evolution. 

This leaves no unambiguous PISN candidates within redshift z< 2 
(although possible examples exist at higher redshift*). We used the pro- 
perties of the Pan-STARRS1 Medium Deep Survey (PS1 MDS, with a 
nightly detection limit of ~23.5 mag in g,r,i-like filters”"”**“) to constrain 
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the local rate of stripped-envelope PISNs. We simulated PS1 MDS obser- 
vations of 80, 100 and 130M. helium core PISN models’ using our own 
Monte Carlo code” (Supplementary Information section 6), requiring 
an apparent magnitude <21 in at least one bandpass and a continuous 
100-day (observer-frame) window of PS1 monitoring before consider- 
ing an event a candidate PISN detection. Initially assuming a rate of 
10 > Recsw (where Recsn is the rate of occurrence of core-collapse 
supernovae’’) for each model, we typically find five 100M PISN candi- 
dates per year, at z< 0.6. The 130Mo explosions have peak near-ultraviolet 
magnitudes of — 22, resulting in apparent rp, and ip, magnitudes <20. 
PS1 should detect >90% of these within z < 0.6 (ten or more per year). 
Taking the 100Mo result, the fact that we have not detected a single 
transient with these properties in the three years of PS1 is inconsistent 
with our assumed explosion rate at a level of 3.90 (Poisson statistics). 
This implies a 30 upper limit on their rate (within z < 0.6) of <6 X 10 ° 
Recsn; even allowing another factor of ~2 to conservatively cover 
detection issues such as bad pixels or bright nearby stars, the rate of 
occurrence of super-luminous PISNs of type Ic must be at least a factor 
of ten lower than the overall rate of type-I superluminous supernovae”. 
PS1-1lap was our best candidate for a PISN explosion, but it fails to 
match the models. However, our calculation suggests that almost all the 
lower-mass (80M 5) PISNs would escape detection. Future searches for 
PISN candidates should target these fainter explosions at lower redshift 
(and larger volumes), or the more luminous candidates at z> 1. 

We conclude that the classification of some slow-fading super- 
luminous supernovae” as radioactively driven is not supported obser- 
vationally, and propose that these events can be united with virtually all 
known type-Ic super-luminous supernovae into a single class. Magnetar- 
powered models can explain their brightness and colours, and account 
for their diversity. The low upper limit we find for the rate of very massive 
PISNs reduces their potential impact on cosmic chemical evolution 
within z < 1. This relieves possible tension between their proposed 
existence in the nearby Universe, and the lack of detected chemical enrich- 
ment signatures in metal-poor stars and damped Lyman-o systems”. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
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Extended Data Figure 1 | Multi-colour photometry of PTF 12dam. 


Observed light curve of PTF 12dam in UVW2, UVM2, UVW1, u, g, 7, i, z 
(AB magnitudes) and J, H, K (Vega magnitude system). 
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Extended Data Figure 2 | Image subtraction for the three earliest 
Pan-STARRS1 epochs of PTF 12dam in gp), rp; and ip, using SDSS frames 
as reference images (taken on 11 February 2003). These illustrate reliable 
image subtraction, resulting in clear detections of PTF 12dam at early phases. 
The images on the left are our PS1 detections, those in the centre are the SDSS 
templates, and on the right are the differences between the two. The bright 
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star in the lower right was saturated and hence does not subtract cleanly. At 
each PS1 epoch there are two images, taken as transient time interval pairs. 
Photometry was carried out and determined in the SDSS photometric system to 
match the bulk of the follow-up griz imaging. The white areas are gaps 
between the 590 X 598 pixel cells in the PS1 chip arrays. 
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Extended Data Figure 3 | Spectral evolution of PTF 12dam. Full time-series _ for the host galaxy has been calibrated against SDSS and GALEX 

optical and near-infrared spectroscopy of PTF 12dam, from two weeks before (Galaxy Evolution Explorer) photometry and subtracted from the last three 
maximum light to an extended pseudo-nebular phase at 100 to >200 days spectra. RF, rest-frame. 

afterwards. A Starburst99 model continuum spectral energy distribution 
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Extended Data Figure 4 | Effective temperature evolution of PTF 12dam phase well. PISN models do not reach such high effective temperatures, and 
and SN 2007bi, compared with magnetar-powered and pair-instability show an approximately 100-day temperature plateau as they rise, before 
models. The magnetar model comes much closer to reproducing the high declining after maximum light. 

photospheric temperatures we observe, and matches the gradient of the decline 
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Extended Data Figure 5 | Modelling of the O1, Mg1 and Fe! line fluxes in 
SN 2007bi at 367 days post-peak. We plot contours for oxygen, magnesium 
and iron line fluxes predicted by our model in units of L= 10*° ergs | 

(dark blue = L/3; light blue = L; red = 3L; where L is the approximate 
luminosity of the lines in the 367-day post-peak spectrum of SN 2007bi) as 
functions of the respective ion density, {no1, Mog MEe u}, and electron density, 
Ne at 5,000 K (approximately the temperature derived for the iron zone from 
the relative strengths of iron lines). The panels for O1 and Mg1 show two 
lines (0 16,300, 7,774 A; Mg14,571, 5,180 A), whereas Fe 11 shows only contours 
for the 5,200 A blend. No blending is likely to occur for any of the oxygen lines; 
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the region where they intersect therefore gives the allowed densities, 
constraining 1, to about 10’ cm ° (this is quite insensitive to the temperature 
we assume). Blending is also unlikely for Mg1] 4,571 A, and the allowed Mg 
density is therefore the intersection of this contour with n, ~ 10?cm~°, 
which can be seen to give tigi S 10°cm *. At this magnesium density, we see 
that the Mg1 5,180 A line makes some contribution to the 5,200 A flux. 

Also shown is the allowed Fe 1 density at this temperature, for iron-zone 
electron densities spanning a factor of ten either side of that in the 
oxygen/magnesium zones. 


©2013 Macmillan Publishers Limited. All rights reserved 


LETTER 


45 


---10°7 erg 
44.5; = --40°3 erg | 


log L (erg s') 
L 
Nad aN 
2-3 


& 


42.5; 


4 1 L i 1 1 L HN ra 
sr 00 -50 0 50 100 150 200 250 300 35C 
Restframe Days 


Energy (erg) Mass of *°Ni (Mc) Ejecta Mass (Mo) Total rise time (days) xX’ / D.O.F. 
10°" 14.1 14.7 88.9 8.7 
10° 14.1 21.5 64.4 1.8 
10° 16.1 52.2 64.4 3.0 


Extended Data Figure 6 | Fits to the observed bolometric light curve of PTF —_ produced in physical models; such large nickel fractions are only expected to be 
12dam with radioactive °°Ni powered ejecta. The formal fits of the models produced in thermonuclear explosions (supernova Ia or possibly PISN), 

with kinetic energies of 10°? and 10° erg are good (see graph), but the required — whereas the total ejected mass corresponds to the core-collapse of a massive star 
combinations of °°Ni masses and ejecta masses (see data table) are not below the pair-instability threshold. 
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Extended Data Table 1 | Optical photometry of PTF 12dam in SDSS griz bands, and k-corrections derived from our spectra. 


RF Phase 


Date MJD Have Telescope g kK, r kK, i K; z k, 
2012-04-13  56030.48 -51.9 PS1 19.45 (0.17) 0.20 20.04(0.21)° 0.24 
2012-04-14 56031.45 -51.0 PS1 19.62 (0.15)> 0.11 
2012-04-25  56042.47 441.4 PS1 18.65 (0.14) 0.11 
2012-04-28  56045.48 -38.4 PS1 18.67 (0.21)° 0.24 
2012-04-29  56046.50 -37.5 PS1 18.30 (0.17)° 0.20 
2012-05-23 56071.13 -15.2 GTC+OSIRIS  17.09(0.01) 0.11 17.32(0.01) 0.20 17.53(0.01) 0.24 
2012-05-25  56072.92 -13.6 WHT+ACAM ~=-:17.13 (0.01) 0.11. —s- 17.26 (0.01) «0.19 s-17.59(0.01) 0.24 = 17.52 0.01) 0.14 
2012-05-29  56076.95 -10.0 LT+RATCam 16.88 (0.01) 0.11 17.11(0.01) 0.16 17.38(0.01) 0.23 17.37(0.04) 0.14 
2012-06-02  56080.94 -6.4 LT+RATCam  16.84(0.01) 0.12 17.05(0.01) 0.14 17.34(0.02) 0.22 
2012-06-03  56081.94 5.5 LT+RATCam  16.84(0.01) 0.12 17.02(0.01) 0.13 17.33(0.01) 0.22 17.27(0.03) 0.14 
2012-06-20  56098.03 9.1 TNG + LRS 16.76 (0.01) 0.06 16.97(0.01) 0.19 17.24(0.01) 0.20 17.12(0.01) 0.11 
2012-06-25  56104.02 14.5 WHT+ACAM ~~‘ 17.00(0.01) 0.03 += 17.05 (0.01) — 0.21 17.28 (0.01) 019 17.15(0.01) 0.10 
2012-07-07 56116.01 25.3 NOT+ALFOSC = 17.05 (0.01) 4 *s 17.15 (0.01) 0.19  17.37(0.01) 0.18 17.24(0.01) 0.10 
2012-07-17 56128.03 36.2 GTC + OSIRIS 17.25 (0.05) 0.17 
2012-08-09  56148.93 55.0 NOT+ALFOSC 17.76 (0.01) 4) +~—'17.81 (0.01) 0.21 17.66 (0.01) 0.15 17.51(0.04) 0.07 
2012-08-21  56160.92 65.9 WHT +ACAM — 17.99 (0.01) 9 07 17.67 (0.01) 0.16 17.88(0.01) 0.14  17.53(0.01) 0.05 
2012-09-04 56174.85 78.5 NOT+ALFOSC = 18.22 (0.01) 4 03 17.86 (0.01) 0.10 17.99(0.01) 0.16 17.75(0.02) 0.05 
2012-09-21  56191.86 93.8 WHT+ACAM = 18.58 (0.01) 0.01. —Ss «18.21 (0.01) 0.03 18.20(0.01) 0.19 17.95(0.01) 0.02 
2012-12-23  56285.21 178.1 LT+RATCam  19.84(0.06)° 0.24 19.27(0.02) 0.03 19.47(0.11)° 0.05 19.03(0.20)° 0.01 
2012-12-25  56287.63 180.3 FTN + FS02 19.30 (0.09) 0.03 19.49(0.01)° 0.05 19.12(0.11)° 0.03 
2013-01-19 56312.15 202.5 LT+RATCam 20.14 (0.05) 0.24 = 19.72 (0.04) 0.04. =. 20.14(0.20)° 0.01 ~—- 19.23 (0.24)° 0 - 
2013-01-27 56320.21 209.8 LT+RATCam 20.45 (0.10) 0.24 = 19.79 (0.05) 0.04. = 19.78 (0.14) 0.00 ~—- 19.35 (0.12)° ann 
2013-02-10 56334.17 222.4 LT + RATCam 19.96 (0.04) 0.04 
2003-02-11 52681.46 SDSS (host) 19.30 (0.01) 19.15 (0.01) 18.70 (0.01) 19.31 (0.07) 


Magnitudes have been corrected for host galaxy contamination; those labelled with a superscript ‘S’ were determined after image subtraction with SDSS templates (see Supplementary Information section 2.1). 
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Extended Data Table 2 | Photometry of PTF 12dam outside the optical range. 


Date MJD Phase Telescope Uvwe2 UVM2 UVW1 SDSS u* J H K 

2012-05-22 56070.38 45.9 SwifttUVOT cee ian Bd 16.86 (0.06) 

2012-05-26 56074.89 11.8 TNG+NICS pipe bigs Hee 
2012-05-30 6077.80 9.2 SwifttUVOT oan be aan 16.71 (0.06) 

2012-06-03 56082.06 5A TNG+NICS ba be hye 
2012-06-07 56085.69 24 SwiftsUVOT oe as ie 16.62 (0.07) 

2012-06-10  56089.03 0.9 NOT+NOTCam a ba phe 
2012-06-13 56091.67 3.3 SwifttUVOT ho Rip ne 16.58 (0.06) 

2012-06-20 — 56098.52 9.5 SwiftsUVOT noe) pp pe 16.80 (0.06) 

2012-06-27 56106.06 16.3 SwiftsUVOT be ae nae 16.99 (0.08) 

2012-06-28 56107.44 17.6 Swift+UVOT bro pope (ato) 17.00 (0.09) 

2012-07-04 56112.68 22.3 Swift+UVOT fol Ms “4 uaa 17.32 (0.08) 

2012-07-04 5613.05 22.6 NOT+NOTCam Pp Ree py 
2012-07-09 5611804 274 TNG+NICS press 

2012-08-01 56141.25 48.1 UKIRT+WECAM aw) 

2012-08-05 56145.03 51.5 NOT+NOTCam men ape 
2012-09-07 56177.04 80.4 NOT+NOTCam be as he 
2013-02-20 56343.00 230.4 NOT+NOTCam be - vid ipod 
2013-03-22 56374.04 258.4 NOT+NOTCam & = He nae 
2013-04-25  56407.00 288.2 NOT+NOTCam ean pes an 


Ultraviolet photometry in Swift UVOT (Ultraviolet and Optical Telescope) bands, and near-infrared photometry in JHK (for details of the data, see Supplementary Information section 2.1). 
*SDSS DR9 host magnitude: u = 19.67 (0.03). 
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Extended Data Table 3 | Pan-STARRS1 photometry of PS1-11ap used in this work. 


Date MJD Telescope ip; 2p1 

2010-12-31 55561.6 PS1 21.49 (0.02) 

2011-01-09 55570.55 PS1 21.06 (0.02) 

2011-01-15 55576.62 PS1 20.74 (0.02) 

2011-01-22 55583.52 PS1 20.78 (0.04) 
2011-01-24 55585.44 PS1 20.47 (0.01) 

2011-01-25 55586.61 PS1 20.70 (0.03) 
2011-01-28 55589.56 PS1 20.63 (0.03) 
2011-01-31 55592.58 PS1 20.58 (0.04) 
2011-02-03 55595.53 PS1 20.54 (0.02) 
2011-02-21 55613.44 PS1 20.38 (0.02) 
2011-03-11 55631.38 PS1 20.35 (0.03) 
2011-03-14 55634.32 PS1 20.41 (0.04) 
2011-03-26 55646.43 PS1 20.52 (0.03) 
2011-03-29 55649.50 PS1 20.56 (0.06) 
2011-04-22 55673.34 PS1 20.73 (0.03) 
2011-04-25 55676.34 PS1 20.81 (0.06) 
2011-05-01 55682.25 PS1 20.82 (0.04) 
2011-05-13 55694.28 PS1 20.90 (0.07) 
2011-05-22 55703.28 PS1 20.83 (0.06) 
2011-05-25 55706.26 PS1 20.99 (0.07) 
2011-05-31 55712.26 PS1 21.02 (0.03) 
2011-06-06 55718.26 PS1 21.08 (0.05) 
2011-12-30 55925.90 PS1 22.60 (0.14) 


The jp; magnitudes are transformed to Zp; using the observed colour i — z= —0.18 at the earliest z point, MJD = 55583.52, with / linearly interpolated to this epoch (see Supplementary Information section 2.2). 
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Extended Data Table 4 | Log of spectra for PTF 12dam and the PS1-11ap spectra used in this work. 


Date MJD RF phase (days) Instrument Grism/Grating Range (A) Resolution (A) 
PS1-11ap 
2011-02-22 55614 -1 WHT + ISIS R300B; R158R 3150-10500 12 
2011-06-22 55734 +78 GN + GMOS R150 4000-11000 23 
PTF12dam 
2012-05-23 56070.99 -16 Asiago Copernico + AFOSC Gr04 3400-8200 25 
2012-05-24 56071.12 “15 GTC + OSIRIS R300R 3500-10000 30 
2012-05-25 56072.91 -14 WHT + ISIS R300B; R158R 3250-5100; 5500-9500 3:5 
2012-05-26 56073.95 -13 TNG + NICS lJ 8700-14500 35 
2012-06-01 56079.96 8 NOT + ALFOSC Gr04 3500-9000 15 
2012-06-08 56086.95 -2 NOT + ALFOSC Gr04 3500-9000 15 
2012-06-21 56100.04 +10 NOT + ALFOSC Gr04 3700-9000 15 
2012-06-25 56103.99 +14 WHT + ISIS R300B; R158R 3200-5200; 5450-10000 6311 
2012-06-29 56107.97 +18 Asiago Copernico + AFOSC VPH6 3600-10000 15 
2012-07-09 56117.99 +27 TNG + NICS lJ 8700-13500 35 
2012-07-17 56125.95 +34 Asiago Copernico + AFOSC Gro4 3900-8140 13 
2012-07-19 56128.03 +36 GTC + OSIRIS R1000B 3600-7900 7 
2012-08-09 56148.95 +55 NOT + ALFOSC Gr04 3500-8200 20 
2012-08-20 56159.92 +65 WHT + ISIS R300B; R158R 3200-5300; 5450-10000 4:7 
2012-09-22 56192.86 +95 WHT + ISIS R300B; R158R 3200-5300; 5450-10000 611 
2012-12-16 56277.5 +171 GN + GMOS B600; R400 3500-8900 3:4 
2013-02-10 56334.17 +221 WHT + ISIS R300B; R158R 3200-5300; 5400-10000 5;10 


Spectral evolution of PTF 12dam is plotted in Extended Data Fig. 3. Full PS1-1 lap time-series are to be presented elsewhere®?. 
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31. McCrum, M. et al. The super-luminous supernova PS1-1 lap: bridging the gap 
between low and high redshift. Mon. Not. R. Astron. Soc. (submitted). 
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Deterministic entanglement of superconducting 
qubits by parity measurement and feedback 


D. Risté', M. Dukalski!, C. A. Watson!, G. de Lange, M.J. Tiggelman', Ya. M. Blanter!, K. W. Lehnert, R. N. Schouten! & L. DiCarlo! 


The stochastic evolution of quantum systems during measurement 
is arguably the most enigmatic feature of quantum mechanics. 
Measuring a quantum system typically steers it towards a classical 
state, destroying the coherence of an initial quantum superposition 
and the entanglement with other quantum systems. Remarkably, 
the measurement of a shared property between non-interacting quan- 
tum systems can generate entanglement, starting from an uncorre- 
lated state. Of special interest in quantum computing is the parity 
measurement’, which projects the state of multiple qubits (quantum 
bits) to a state with an even or odd number of excited qubits. A parity 
meter must discern the two qubit-excitation parities with high fidelity 
while preserving coherence between same-parity states. Despite nume- 
rous proposals for atomic’, semiconducting’*’ and superconduct- 
ing qubits*”, realizing a parity meter that creates entanglement for 
both even and odd measurement results has remained an outstand- 
ing challenge. Here we perform a time-resolved, continuous parity 
measurement of two superconducting qubits using the cavity in a 
three-dimensional circuit quantum electrodynamics” architec- 
ture and phase-sensitive parametric amplification’”. Using postse- 
lection, we produce entanglement by parity measurement reaching 
88 per cent fidelity to the closest Bell state. Incorporating the parity 
meter in a feedback-control loop, we transform the entanglement 
generation from probabilistic to fully deterministic, achieving 66 
per cent fidelity to a target Bell state on demand. These realizations 
of a parity meter and a feedback-enabled deterministic measurement 
protocol provide key ingredients for active quantum error correction 
in the solid state'*». 

Recent advances in nearly quantum-limited amplification’? and 
improved qubit coherence times in three-dimensional circuit quantum 
electrodynamic (3D cQED) architectures’’ have allowed investiga- 
tions of the gradual collapse of single-qubit wavefunctions in the solid 
state’*'’, following previous fundamental studies in atomic systems’. 
The continuous measurement of a joint property extends this study to 
the multiqubit setting, resolving the projection to states which are 
inaccessible via individual qubit measurements. In a two-qubit system, 
the ideal parity measurement transforms an unentangled superposi- 
tion state |w°) = (|00) + |01) + |10) + |11))/2 into Bell states 
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for odd and even outcome, respectively. Beyond generating entangle- 
ment between non-interacting qubits’*’, parity measurements allow 
deterministic two-qubit gates*”” and play a key role as syndrome detec- 
tors in quantum error correction'*"*. A heralded parity measurement 
has been recently realized for nuclear spins in diamond”. By minimiz- 
ing measurement-induced decoherence at the expense of single-shot 
fidelity, Pfaff et al. generated highly entangled states with 3% success 
probability. Here we realize the first solid-state parity meter that pro- 
duces entanglement with unity probability. 


Our parity meter realization exploits the dispersive regime’? in two- 
qubit cQED. Qubit-state dependent shifts of a cavity resonance (here, 
the fundamental of a 3D cavity enclosing transmon qubits Q, and Qs) 
allow joint qubit readout by homodyne detection of an applied micro- 
wave pulse transmitted through the cavity (Fig. la). The temporal 
average Vin, of the homodyne response Vp(t) over the time interval 
[t,, te] constitutes the measurement needle, with expectation value 


(Vint) =Tr(Op) 


where p is the two-qubit density matrix and the observable O has the 
general form 


O= By + Bot + Ppa? + Bpaoeoe 


where o,/ is the Pauli operator for qubit q. The coefficients Bo, Ba, Bp 
and fa depend on the strength gp, frequency fp and duration Tp of the 
measurement pulse, and also on the cavity linewidth « and the fre- 
quency shifts 274 and 27, of the fundamental mode when Qa and Qz 
are individually excited from |0) to |1). The necessary condition for 
realizing a parity meter is B, = Bg = 0 (fo constitutes a trivial offset). 
A simple approach**', pursued here, is to set fp to the average of 
the resonance frequencies for the four computational basis states |ij) 
(i, 7 © {0, 1}) and to match y,4 = 7p. We engineer this matching by 
targeting specific qubit transition frequencies f, and fg below and 
above the fundamental mode during fabrication and using an external 
magnetic field to fine-tune fg in situ (Extended Data Fig. 1). We align 
Ya to Xp to within ~0.06xK = 2m X 90 kHz (Fig. 1b). The ensemble- 
average (Vp) confirms nearly identical high response for odd-parity 
computational states |01) and |10), and nearly identical low response 
for the even-parity |00) and |11) (Fig. 1c). The transients observed are 
consistent with the independently measured xk, 74 and yp values, and 
the 4-MHz bandwidth of the Josephson parametric amplifier (JPA; 
Fig. la) at the front end of the output amplification chain (see 
Extended Data Fig. 2 for a detailed schematic of the set-up). Single-shot 
histograms (Fig. 1d) demonstrate the increasing ability of Vint to discern 
states of different parity as t; grows (keeping t; = 0), and its inability to 
discriminate between states of the same parity. The histogram separa- 
tions at t; = 400 ns give |Ba|, | Bp] < 0.02 |Bga| (Extended Data Fig. 3). 

Moving beyond the description of the measurement needle, we now 
investigate the collapse of the two-qubit state during parity measure- 
ment. We prepare the qubits in the maximal superposition state 
\y°) (|00) + |01) +|10) +]11)), apply a parity measurement pulse 
for tp, and perform tomography of the final two-qubit density matrix p 
with and without conditioning on Vi, (Fig. 2a). We choose a weak 
parity measurement pulse exciting n,,=2.5 intra-cavity photons on 
average in the steady-state, at resonance. A delay of 3.5/K = 350 ns is 
inserted to deplete the cavity of photons before performing tomo- 
graphy. The tomographic joint readout is also carried out at fp, but 
with 14 dB higher power, at which the cavity response is weakly non- 
linear and sensitive to both single-qubit terms and two-qubit correla- 
tions (B, ~ Bg ~ bpa, see Extended Data Fig. 3), as required for 
tomographic reconstruction”. 
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Figure 1 | Realization of cavity-based two-qubit parity readout in circuit 
QED. a, Simplified diagram of the experimental set-up. Single- and double- 
junction transmon qubits (Q, and Qs, respectively) dispersively couple to the 
fundamental mode E of a 3D copper cavity enclosing them. The transition 
frequency of Qg, is tuned by a static magnetic field B generated by an external 
coil. Parity measurement is performed by homodyne detection of the qubit 
state-dependent cavity response’® using phase-sensitive Josephson parametric 
amplification (JPA)'*. Following further amplification at 4 K by a low-noise 
semiconductor amplifier (HEMT) and room temperature, the signal is 
demodulated and integrated. A field-programmable-gate-array (FPGA) 
controller closes the feedback loop that achieves deterministic entanglement by 
parity measurement (Fig. 4). b, Matching of the dispersive cavity shifts realizing 
a parity measurement. Different colours correspond to qubits prepared in |00) 
(black), |01) (blue), |10) (red) and |11) (purple). c, Ensemble-averaged 
homodyne response (Vp) for qubits prepared in the four computational basis 
states (same colour code). d, Curves, corresponding ensemble averages of the 
running integral (V;,,,) of (Vp) between t; = 0 and t; = t. Single-shot histograms 
(5,000 counts each) of Vint are shown in 200-ns increments. 


The ideal continuous parity measurement gradually suppresses the 
unconditioned density matrix elements pjxj = (ij|p|kl) connecting 
states with different parity (either i~k or j 4 J), and leaves all other 
coherences (off-diagonal terms) and all populations (diagonal terms) 
unchanged. The experimental tomography reveals the expected sup- 
pression of coherence between states of different parity (Fig. 2b, c). The 
temporal evolution of |11,10|, with near full suppression by tp = 400 ns, 
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Figure 2 | Unconditioned two-qubit evolution under continuous parity 
measurement. a, Pulse sequence including preparation of the qubits in the 
maximal superposition state p = \y/°) (W°|, parity measurement and 
tomography of the final two-qubit state p using joint readout. b, Absolute 
coherences |11,10|3 |Po1,10!> |Poo,11| following a parity measurement with 
variable duration tp. Free parameters of the model are the steady-state 
photon number on resonance fi,, =2.5+0.1, the difference (vy, — 7p)/ 

mT = 235 + 4kHz, and the absolute coherence values at tp = 0 to account for 
few-percent pulse errors in state preparation and tomography pre-rotations. 
Note that the frequency mismatch differs from that in Fig. 1b owing to its 
sensitivity to measurement power. Error bars, standard deviation of 15 
repetitions of tomography, with 22,500 averages per set of pre-rotations. 

c, d, Extracted density matrices for tp = 0 (c) and tp = 400 ns (d), by which 
time coherence across the parity subspaces (grey columns) is almost fully 
suppressed, while coherence persists within the odd-parity (orange columns) 
and even-parity (green columns) subspaces. See Extended Data Fig. 4 for the 
temporal evolution with parity measurement off and Extended Data Figs 5 and 
6 for two-qubit tomography at other values of tp and fis, respectively. 


> > 


is quantitatively matched by a master-equation simulation of the two- 
qubit system (see Methods). Tomography also unveils a non-ideality: 
albeit more gradually, our parity measurement partially suppresses the 
absolute coherence between equal-parity states, | (091,10 and | P0,1;|- The 
effect is also quantitatively captured by the model. Although intrinsic 
qubit decoherence contributes (see Extended Data Fig. 4 for quantita- 
tive details), the dominant mechanism is the different a.c. Stark phase 
shift induced by intra-cavity photons on basis states of the same parity*”””. 
This phase shift has both deterministic and stochastic components, 
and the latter suppresses absolute coherence under ensemble aver- 
aging. We emphasize that this imperfection is technical rather than 
fundamental. It can be mitigated in the odd subspace by perfecting the 
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Figure 3 | Probabilistic entanglement generation by postselected parity 
measurement. a, Histograms of Vint (tp = 300 ns) for the four computational 
states. The results are digitized into Mp = +1(—1) for Vp below (above) a 
chosen threshold. Coloured areas indicate the choice of thresholds Vi,4 and 
Vin— for Mp =+1 (green) and Mp = —1 (orange), respectively. b, Parity 
readout fidelity Fp as a function of tp. We define Fp = 1 — €. — & , with 

& = p(Mp = — lleven) the readout error probability for a prepared even state, 
and similarly for ¢,, using the Fp-maximizing single threshold V,, (see also 
Extended Data Fig. 7). Data are corrected for residual qubit excitations (see 
Methods). Error bars (smaller than the dot size) are the standard deviation 
obtained from 15 sets of histograms, each with 22,500 counts per prepared state. 
Model curves are obtained from 5,000 quantum trajectories for each initial state 
and tp, with quantum efficiencies 7 = 0.25, 0.5 and 1 for the readout 
amplification chain (see Methods). No single value of 7 matches the dependence 


matching of 7p to ya, and in the even subspace by increasing 7, p/K 
(~1.3 in this experiment). 

The ability to discern parity subspaces while preserving coherence 
within each opens the door to generating entanglement by parity 
measurement on |. For every run of the sequence in Fig. 2, we 
discriminate Vin, using the threshold V,, that maximizes the parity 
measurement fidelity Fp (Fig. 3a). Assigning the parity measurement 
result Mp = +1 (-1) to Vj, below (above) Vi, we bisect the tomo- 
graphic measurements into two groups, and obtain the density matrix 
for each. We quantify the entanglement achieved in each case using 
concurrence C as the metric”, which ranges from 0% for an unen- 
tangled state to 100% for a Bell state. As tp grows (Fig. 3b), the optimal 
balance between increasing Fp at the cost of measurement-induced 
dephasing and intrinsic decoherence is reached at ~300 ns (Fig. 3c). 
Postselection on Mp = +1 achieves Cy, — —) = 45+3%andCy, = 41= 
17+3%, with each case occurring with probability Pyuccess ~ 50%. The 
higher performance for Mp =-1 results from lower measurement- 
induced dephasing in the odd subspace, consistent with Fig. 2. 

The entanglement achieved by this probabilistic protocol can be 
increased with more stringent postselection. Setting a higher threshold 
Vin- achieves Cy, =—1=77+2% but keeps Peuccess ~ 20% of runs. 
Analogously, using Vi,+ achieves Cy,—+1;=29+4% with similar 
Psuccess (Fig. 3d, e). However, increasing C at the expense of reduced 
Psuccess is not evidently beneficial for quantum information processing. 
For the many tasks calling for maximally-entangled qubit pairs (ebits), 
one may use an optimized distillation protocol” to prepare one ebit 
from N= 1/E,/(p) pairs in a partially-entangled state p, where Ey is 
the logarithmic negativity”*. The efficiency € of ebit generation would 
be E=PeuccessEy(p). For postselection on Mp =~-1, we calculate 
€=0.31 ebits per run using V;, and €=0.20 using V,,_. Evidently, 
increasing entanglement at the expense of reducing Pguccess is counter- 
productive in this context. 
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of Fp on tp. We attribute this discrepancy to low-frequency fluctuations in the 
parametric amplifier bias point, not included in the model. c, Concurrence C of 
the two-qubit entangled state obtained by postselection on Mp = —1 (orange) 
and on Mp = +1 (green squares). Open symbols correspond to the threshold 
Vin that maximizes Fp, binning Psuccess ~ 50% of the data (from Fig. 2) into each 
case. Filled symbols correspond to a threshold Vi,—(Vin+) for postselection on 
Mp = —1(+1), at which ¢,(é,) = 0.01. Concurrence is optimized at tp ~ 300 ns, 
where Peuccess ~ 20% in each case. We use maximum-likelihood estimation” 
(MLE) to ensure physical density matrices, but concurrence values obtained 
with and without MLE differ by less than 3% over the full data set. Error bars, 
standard deviation of the concurrence obtained from 15 repetitions of 
tomography. d, e, State tomography conditioned on Vp > Vi,— (d) and 

Vp < Vin (e), with tp = 300 ns, corresponding to the dark symbols in c. The 
fidelity of these states to the closest Bell state is 88% (d) and 64% (e). 


Motivated by the above observation, we finally demonstrate the use 
of feedback control to transform entanglement by parity measurement 
from probabilistic to deterministic, that is, Psuccess = 100%. Whereas 
initial proposals in CQED focused on analogue feedback schemes”, 
here we adopt a digital strategy**. Specifically, we use a custom-built 
programmable controller to apply a m pulse on Qa conditional on 
measuring Mp = +1 (using Vy, Fig. 4). In addition to switching the 
two-qubit parity, this pulse lets us choose which odd-parity Bell state to 
target by selecting the phase ¢ of the conditional pulse. To optimize 
deterministic entanglement, we need to maximize overlap to the same 
odd-parity Bell state for Mp = -1 (Fig. 4b) as for Mp = +1 (Fig. 4c). 
For the targeted state |®*), this requires cancelling the deterministic 
a.c. Stark phase yg, =0.73m accrued between |00) and |11) when 
Mp = +1. This is accomplished by choosing g = (m — ¢,)/2, which 
clearly maximizes the entanglement obtained when no postselection 
on Mp is applied (Fig. 4c, d). The highest deterministic C=34% 
achieved is lower than for our best probabilistic scheme, but the boost 
tO Psuccess = 100% achieves a higher € =0.41 ebits per run. 

Our experiment extends the fundamental study of continuous 
measurement’*”” in superconducting circuits to the multiqubit scenario, 
providing a test-bed for the investigation of wavefunction projection and 
induced dephasing. Furthermore, the implemented parity meter gene- 
rates entanglement for any measurement result, making it suitable for 
deterministic quantum information processing protocols. Specifically, the 
combination of parity measurement with digital feedback realizes a 
multiqubit measurement-based protocol in the solid state made deter- 
ministic through feedback, as achieved with photonic”, ionic’”** and 
atomic” systems. Future experiments will target the complementary 
use of analogue feedback control to combat measurement-induced 
dephasing”*®. Integration of analogue feedback will refine the control 
over quantum measurement and feedback’* required to extend quantum 
coherence by active control methods. 
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Figure 4 | Deterministic entanglement generation using feedback. a, We 
close a digital feedback loop by triggering (via the FPGA) a m pulse on Qa 
conditional on the parity measurement result Mp = +1. This 1 pulse switches 
the two-qubit parity from even to odd, and allows the deterministic targeting of 
|*) = (01) +|10)) //2. b, ¢, Full state tomography (with real (imaginary) 
part of the density matrix on the left (right)) for parity measurement results 
Mp = —1 (b) and Mp = +1 (c), each occurring with ~50% probability. The 
deterministic a.c. Stark phase acquired between |01) and |10) during parity 
measurement (due to residual mismatch between 7, and 7g) is compensated by 
a global phase rotation in the tomography pulses. A different a.c. Stark phase is 


METHODS SUMMARY 

Device parameters. Lorentzian best fits to cavity transmission (Fig. 1b) yield 
K = Kout + Kin = 20 X (1.56 £0.01MHz) and {4,,43}/n={—4.03 £0.02, 
—4,21+ 0.02} MHz. From room-temperature characterization, we estimate 
asymmetric output/input couplings Kout/Kin ~ 8. The qubits have transition fre- 
quencies {f,, fg} = {5.52, 7.80} GHz, relaxation times {Tete} = {22,7}us, and 
pure dephasing times ees A Te [| = {11,8} ps (see also Extended Data Fig. 8). 
Using the method detailed in ref. 31, we estimate a residual excitation of 1%(2%) 
for Q4(Qz). 

Readout signal processing. In Fig. 1b we probe the cavity with a pulse (7,,~1.4) 
at variable frequency, after preparing the qubits in one of the four computational 
states. The cavity transmission is acquired with homodyne detection at 10 MHz 
intermediate frequency. In Fig. 1c and d the cavity response (7i,s=2.5), first 
amplified by the JPA, is demodulated with 0 intermediate frequency (measure- 
ment, local oscillator, and pump tones are provided by the same generator). For each 
shot, the average homodyne signal over a 2.5 [is window preceding state preparation 
is subtracted. This subtraction mitigates the infiltration of low-frequency fluctua- 
tions in the JPA bias. In Figs 2-4, f, = 100 ns and ts = tp + 150 ns, experimentally 
found to maximize Fp. Similarly, an offset integrated over 2.5 1s is subtracted from 
each Vine (Extended Data Fig. 9). 

Model. The system is described by the dispersive Hamiltonian”: 


iS Xt) .¥ 5! 


q=AB q=AB 


H/h= Ge 
+ ep [ate Hm + ae tiont] 


The cavity-mediated qubit-qubit interaction J (oF a8 +03 of ) is disregarded, as 
J vanishes for 14 =A’g. We model the evolution of p following the method of 
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acquired between |00) and |11), resulting in the state shown in c, with the 
maximal overlap with even Bell state [|00) +exp(—ig,)|11)]/ V2 at 

(- = 0.73n. d, Generation rate of entanglement using feedback, as a function of 
the phase of the pulse. The deterministic entanglement generation 
efficiency outperforms the efficiencies obtained with postselection (Fig. 3). 
Error bars, standard deviation of seven repetitions of the experiment at each 9. 
e, Full state tomography for deterministic entanglement [gy = (1 — 9,)/2], 
achieving fidelity (D"|p|®") = 66% to the targeted |"), and concurrence 
C =34%. Coloured bars highlight the contribution from cases Mp = —1 
(orange) and Mp = +1 (green). 


quantum trajectories in refs 8, 9, 21 and 32. The stochastic master equation, valid 
for t<T4,T8, is: 


De 


q=A,B 


= >. Vij. (im [jan] +iRe [xj] ) Typ UM ydt 
ikl 


1 1 
(Plot + pga [etl a 


+ /xiM [T1,e7'*] pdW(t) 


with operators I]; = 


j= li) Gl and T,=})ajIlj, super-operators D[O]p= 
ij 


1 . 
p09" — >{0'0.p} and M[0|p=Op+p0'—(0+6")p. Here, @ is the 


homodyne-detection phase set by the JPA pump, and jx) = —Vy, where 
X= (ij| » . X,03|ij). The dynamics of a; in the frame rotating at cop is given by: 
q=A, 


. . . K 
dy = — iep(t) —i(@, — wp + Xi) uy — = 


z°8 


dW is the noise in the homodyne record: 
Vp(t)dt oc , /in(T,e~ + The”\ de +dw 


Quantum trajectories are unravelled by numerically solving equation (1) with 
dt = 1 ns and a Wiener white-noise process dW (zero mean, variance dt) gener- 
ated pseudo-randomly. For each trajectory, Vint is obtained using the same integ- 
ration and offset-subtraction parameters as in the experiment. The unconditioned 
p is obtained by solving equation (1) without the last term. 
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Extended Data Figure 1 | Spectroscopy of the two-qubit and cavity system. 
The transition frequency of Qs is tuned by applying magnetic flux through its 
SQUID loop with an external coil. Qa (fq = 5.52 GHz) is a single-junction 
transmon and thus not tunable. Qs was designed tunable to allow trimming of 
the dispersive-shift matching condition. However, the maximal frequency of 
Qs (fg = 7.80 GHz) is still approximately 20 MHz lower than needed for a 
perfect match of dispersive shifts. Thus, we flux bias Qg at this maximal 
frequency, which is also optimal for coherence. Inset, higher resolution 
spectroscopy of the avoided crossing of Qg with the cavity fundamental mode 
(f, = 6.55 GHz), revealing a minimum splitting of 167 MHz. 
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Extended Data Figure 2 | Detailed schematic of the experimental set-up. 
Complete wiring of electronic components outside and inside the *He/*He 
dilution refrigerator (Leiden Cryogenics CF-650). Readout and qubit-drive 
pulses, shaped by a Tektronix AWG5014 and two Tektronix AWG520, enter 
the cavity via a single transmission line. The cavity output is reflected by the 
JPA, which is biased by a superconducting coil and a strong pump tone, 
bending its resonance down to fp and providing parametric amplification’*. The 
signal is further amplified at the 3 K stage (Caltech Cryol-12, 0.06 dB noise 
figure) and at room temperature (two Miteq AFS3-04000800-10-ULN 
amplifiers, 0.8 dB noise figure). Demodulation to baseband is provided by a 


PAMTECH 
CTHO408KCS 


Krytar 
104020020 


generator at fp, also used for readout and pump. Two phase shifters allow 
adjusting the relative phase between the three tones at fp. The demodulated 
signal is split into three separate arms after amplification by a Stanford Research 
Systems SR445A. One arm stabilizes the JPA flux bias via an ADwin-GOLD 
processor programmed as a PID controller**. In the second arm, the signal is 
filtered by a bias tee, amplified with a custom-built amplifier, and integrated 
and thresholded by the FPGA. The FPGA conditionally triggers a Q, m pulse 
from an AWG520 (Fig. 4). The third arm connects to an AlazarTech ATS9870 
digitizer for data storage and processing after a second SR445A amplification 
stage. Red colour highlights the key components of the feedback loop. 
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Extended Data Figure 3 | Readout configuration for parity measurement 
and state tomography. a, Histograms for the computational basis for the 
parity measurement Mp (tp = 300 ns, igs = 2.5), as in Fig. 3a. At this 
measurement power, states within each parity subspace are largely 
indistinguishable (see also Fig. 1b and Extended Data Fig. 7). For an ideal parity 
measurement, f, = fg = 0. We extract fy, = 0.0146 mV, fz = —0.123 mV, 
Bpa = —6.25 mV and fy = 7.46 mV. b, Histograms for the tomography 
measurement (integration time 850 ns, 7.;~60). At this power, the cavity 
response is nonlinear (critical photon number”? 1g, ~ 60), causing the 
resonance for |10) to bend towards lower frequency. As the resonance for |01) 
is instead power-independent, this effect discriminates |01) from the other 
states. This gives the joint readout the sensitivity to single and two-qubit terms 
required to perform state tomography”. Averaging of raw tomography 
measurements yields Bh, = —8.10 mV, fg = 9.10 mV, Ppa = —12.8 mV and 
fo = 17.1 mV. Digitizing the single shots with threshold Vp = 32 mV gives 
Ba = 0.424, By = —0.360, Bga = 0.379 and fy = 0.540. 
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Extended Data Figure 4 | Temporal evolution of two-qubit superposition 
state with and without continuous parity measurement. Comparison of the 
unconditioned two-qubit evolution during parity measurement (filled symbols, 
same data as in Fig. 2b) and during a delay of the same duration tp (open 
symbols). In the latter case, the decay of |;;4| is solely due to intrinsic qubit 
decoherence. Evidently, measurement-induced dephasing dominates over 
intrinsic qubit dephasing. For reference, we estimate that entanglement (C > 0) 
would be achieved in the deterministic scheme provided the net qubit 
dephasing rate 1 / ThA +1 / TS" <1/0.4 ps", under the experimental 
conditions of Fig. 4. 
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Extended Data Figure 5 | Two-qubit evolution under continuous parity 
measurement. Unconditioned and conditioned state tomography of the final 
two-qubit state similar to Figs 2 and 3, but at more values of tp and using the 
threshold V,,, optimizing parity readout fidelity (7,; = 2.5). Middle row, 
unconditioned evolution. For tp = 0, there is only a 10 ns buffer between state 


| unconditioned 
1. 


preparation and tomography, instead of the 350 ns used in Figs 2-4 and all 
other tp values here. The uniformity of |;,x)| for tp = 0 (<4% relative 
difference) attests to the preparation fidelity of the initial maximal 
superposition state. Top row, evolution conditioned on Vint > Vin (Mp = — 1); 
bottom row, evolution conditioned on Vint < Vin (Mp = +1). 
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Extended Data Figure 6 | Two-qubit unconditioned evolution and mismatch (V4 —A’g)/m (see also Extended Data Fig. 3) is 182 + 32 kHz 
conditioned concurrence for different measurement strengths. (a, d), 220 + 18 kHz (b, e) and 275 + 7 kHz (c, f). Concurrence is calculated 
a-f, Experiment as in Figs 2 and 3 with measurement strength correspondingto —_ after postselection on Vint < Vin (Mp = +1) or Vint > Vin (Mp = —1). 


figs =0.6+0.1 (a, d), 1.4 + 0.1 (b, e) and 3.9 + 0.1 (c, f). The best-fit frequency 
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Extended Data Figure 7 | Cumulative histograms of parity measurements. 
The four computational states are subjected to a parity measurement with 

Tp = 300 ns, ngs =2.5, as in Fig. 3a. At the optimal threshold Vi, (dashed line), 
the average errors in determining the parity are ¢, = 0.13, ¢, = 0.11, yielding 
a parity measurement fidelity of Fp = 1 — ¢. — & = 0.76 (corrected for 
residual qubit excitations, see Methods). In a similar manner, we define the 
distinguishability within each parity subspace as the fidelity of the 
measurement discriminating between those states, yielding 0.03 for the even 
subspace and 0.02 for the odd. 
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Extended Data Figure 8 | Frequency-dependent coherence times of Qz. 
Energy relaxation times Te (filled circles) and Te (square) below the 
fundamental cavity resonance are consistent with the single-mode Purcell 
effect** and a coupling strength g/t = 167 MHz at the Qz-cavity avoided 
crossing, as extracted from spectroscopy (Extended Data Fig. 1). We attribute 
the lower TP above the fundamental resonance to the effect of higher cavity 
modes. Pure dephasing times T? "8 (open circles) are in excellent agreement 
with the first-order approximation for flux noise*’ with spectral density Sp 
(w) = A’/\f| and best-fit A = (1.9 + 0.1) X 10 °®y (dashed line), with ®p the 
flux quantum. 
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Extended Data Figure 9 | Pulse timing and measurement integration 
windows. Extended view (not to scale) of the pulse sequence used in Figs 2-4, 
showing also the integration windows used for parity measurement (tj; for 
signal and To¢ for offset) and tomographic joint readout. All specified time 
intervals are expressed in ns. Qubit control is performed with DRAG pulses*® 
with Gaussian envelopes on the main quadrature (o = 6 ns, 4o total duration) 
and derivative-of-Gaussian envelopes of optimized amplitude on the other. 
Single-qubit pulses are applied sequentially (Qg first), with 10 ns buffer between 
them. The tomography measurement pulse is 1 [1s long, and the homodyne 
response integrated for 850 ns starting after the first 100 ns. 
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Coherent Raman spectro-imaging with laser 


frequency combs 


Takuro Ideguchi'*, Simon Holzner'*, Birgitta Bernhardt'*, Guy Guelachvili*, Nathalie Picqué”? & Theodor W. Hansch? 


Advances in optical spectroscopy and microscopy have had a pro- 
found impact throughout the physical, chemical and biological 
sciences. One example is coherent Raman spectroscopy, a versatile 
technique interrogating vibrational transitions in molecules. It 
offers high spatial resolution and three-dimensional sectioning 
capabilities that make it a label-free tool’* for the non-destructive 
and chemically selective probing of complex systems. Indeed, sin- 
gle-colour Raman bands have been imaged in biological tissue at 
video rates** by using ultra-short-pulse lasers. However, identifying 
multiple, and possibly unknown, molecules requires broad spectral 
bandwidth and high resolution. Moderate spectral spans combined 
with high-speed acquisition are now within reach using multichan- 
nel detection® or frequency-swept laser beams*°. Laser frequency 
combs” are finding increasing use for broadband molecular linear 
absorption spectroscopy'**. Here we show, by exploring their 
potential for nonlinear spectroscopy”’, that they can be harnessed 
for coherent anti-Stokes Raman spectroscopy and spectro-imaging. 
The method uses two combs and can simultaneously measure, on 
the microsecond timescale, all spectral elements over a wide band- 
width and with high resolution on a single photodetector. Although 
the overall measurement time in our proof-of-principle experi- 
ments is limited by the waiting times between successive spectral 
acquisitions, this limitation can be overcome with further system 
development. We therefore expect that our approach of using laser 
frequency combs will not only enable new applications for non- 
linear microscopy but also benefit other nonlinear spectroscopic 
techniques. 

Coherent anti-Stokes Raman spectroscopy (CARS) is a nonlinear 
four-wave mixing process, which is coherently driven when the energy 
difference between a pump laser and a Stokes laser is resonant with a 
Raman active molecular transition. Scattering off the probe beam 
generates the high-frequency-shifted anti-Stokes signal, which is 
enhanced by many orders of magnitude relative to spontaneous 
Raman scattering signals. In our technique of dual-comb CARS, we 
harness two femtosecond lasers with repetition frequencies f + dfandf 
to irradiate a sample. In the time domain (Fig. 1a), a pulse from the first 
laser coherently excites a molecular vibration of period 1/f,;, that is 
longer than the pulse duration and the coherently vibrating molecules 
give rise’ to an oscillating refractive index modulated at the vibra- 
tional frequency (Fig. 1b). A pulse of the second laser probes the 
sample with a time separation At that increases linearly from pulse 
pair to pulse pair. If this second pulse (for simplicity also taken to be 
short compared with the molecular vibration period"*) arrives after a 
full molecular period 1/f,;,, the vibration amplitude is increased and 
the back-action on the probe pulse is a spectral shift towards lower 
frequencies. If it arrives after half a period, the vibration amplitude is 
damped and the pulse experiences a shift towards higher frequencies. 
As long as the pulse separation At remains shorter than the coherence 
time of the molecular oscillation, an intensity modulation of frequency 
frvdf/f is thus observed in the transmitted probe radiation after a 


spectral edge filter. The two femtosecond lasers have a symmetrical 
function: the sign of time separation At between the pulses changes 
every 1/(26f). A theoretical description of such a time-resolved CARS 
signal can be found in ref. 19. 

In the frequency domain (Fig. 1c, d), the two frequency comb gen- 
erators produce an optical spectrum consisting of several hundred 
thousand perfectly evenly spaced spectral lines. Their frequencies 
may be described by 


f= it) Ts 


Fie aa mf +f coo 


where mand m’ are integers, and f... and f ceo are the carrier-envelope 
offset frequencies. 

The frequency differences within each comb form regular combs 
themselves with vanishing carrier-envelope offset frequencies and line 
spacings of f + dfandf, respectively. For instance, for comb 1 all pairs 
of lines with m — n= k contribute to the same difference frequency 
k(f + Sf). Each of the difference frequency combs resonantly excites a 
molecular level of frequency f,ip by means of Raman-like two-photon 
excitation whenever a difference frequency comes close to f,i,; that is, 
when k ~ f,i»/f. The excitations by the two combs interfere and modu- 
late the molecular vibration at a beat note frequency kof = findf/f- The 
two-photon excitation leads to a resonant enhancement of the third- 
order nonlinear susceptibility observed by means of the anti-Stokes 
radiation. The intensity of the generated broadband anti-Stokes radi- 
ation is modulated at the beat note frequency f,;,df/f, When several 
vibrational levels (fiit1, fina --.) are excited, the composite modulation 
contains all the beating frequencies (fini df/f fiv2df/f, ...) represent- 
ative of the involved levels. The Raman excitation spectrum is revealed 
by Fourier transformation of the intensity recorded against time. The 
spectrum is mapped in the radiofrequency domain by the downcon- 
version factor 8f/f (typically of the order of 10’ to 10 °). This permits 
rapid measurement time and efficient signal processing. Absolute cal- 
ibration of the Raman shifts is achieved by dividing the radiofrequen- 
cies by the downconversion factor, which is easy to measure accurately. 
The carrier-envelope offsets cancel and do not have to be measured or 
controlled. This notably simplifies the experimental implementation 
and the calibration procedure. Similar modulation transfer phenom- 
ena have been exploited in experiments using a single femtosecond 
laser and a phase-modulation pulse shaper” or a Michelson interfero- 
meter*’**, but measurement times were fundamentally limited either 
by the sweep period of the phase modulation or by the mechanical 
motion in the Michelson interferometer. Our motionless frequency- 
comb-based technique enables more than 1,000-fold shorter acqui- 
sition times (see also Supplementary Information), and a spectral 
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Figure 1 | Principle of dual-comb CARS. Af,i, stands for f,;,5//f. a, Time- 
domain representation in the limit of a molecular decoherence time that is 
shorter than the time interval between two laser pulses. The train of pulses of 
laser frequency comb 1 periodically excites the molecular vibration, which is 
probed by the pulses of laser frequency comb 2 with a linearly increasing time 
delay. The resulting filtered anti-Stokes radiation provides the interferogram. 
The two combs have a symmetric function. In the figure, only the situation in 
which the delay between the pulses of comb 2 and those of comb 1 is positive is 
displayed. b, When the probe pulse is short compared with the molecular 
oscillation (impulsive stimulated Raman scattering), the refractive index 


resolution and spectral span only limited by the measurement time 
and the spectral bandwidth of the femtosecond lasers. 

Figure 2 sketches the experimental setup (see Methods), which is 
similar to that used in dual-comb absorption spectroscopy'’* except 
for dispersion management and spectral filtering to isolate the CARS 
signal from the comb beams. As the Raman-like two-photon excitation 
involves virtual energy levels, dispersion decreases both the spectral span 
and the excitation efficiency. The time-domain interference signal—the 
interferogram—is periodic. Every 1/8f, a strong burst mostly contains 
the non-resonant four-wave mixing signal resulting from the interfer- 
ence between the overlapping pulses of the two combs. A reproducible 
modulation (Fig. 3a), due to the CARS signal only, follows the burst and 
has a duration proportional to the coherence time of the sample transi- 
tions. A time-windowed portion of the interferogram, which excludes 
the interferometric non-resonant contribution, is Fourier transformed. 
The width of the window is chosen according to the desired spectral 
resolution (see Methods for a detailed explanation of the recording 
parameters). The resulting spectra (Fig. 3b-d) span Raman shifts from 
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modulation of the sample—induced by the pump and Stokes beam—shifts the 
probe spectrum alternatively towards lower and higher frequencies. 

c, Frequency-domain representation in the limit of a comb line spacing that is 
larger than the resonant Raman excitation bandwidth. The two frequency 
combs modulate the excitation amplitude of the molecular vibration with a 
frequency Afi». This modulation is then transferred by the combs to the anti- 
Stokes radiation. For simplicity, the Raman excitations are represented as 
narrow lines. d, Energy-level diagram, illustrating the four-wave mixing that 
leads to intensity-modulated anti-Stokes radiation. 


200 cm to 1,400 cm” '. The non-resonant background, which strongly 
lowers the sensitivity of CARS, is entirely suppressed, as in other specific 
CARS schemes”, 

We illustrate acquisition times with three spectra at an apodized 
resolution of 4cm | and recorded with 5f = 100 Hz (Fig. 3b) or 5 Hz 
(Fig. 3c,d) for a mixture of hexafluorobenzene, nitrobenzene, nitro- 
methane and toluene in a cuvette 5mm long. The spectra involve 
no averaging and were measured in 14.8 1s (Fig. 3b) and 295.5 1s 
(Fig. 3c,d); the number of individual spectral elements (defined as 
the spectral span divided by the resolution) for all three spectra is 
300. The signal-to-noise ratio culminates at 1,000 for the most intense 
blended line of toluene and nitrobenzene in Fig. 3c. Recorded under 
different experimental conditions, the three spectra show great sim- 
ilarities in line position and relative intensity. 

Imaging capabilities are illustrated with a capillary plate (25-ym 
diameter holes, thickness 500 jim) filled with a mixture of hexafluoro- 
benzene, nitromethane and toluene. For each pixel, we measure an inter- 
ferogram within 12 [1s to obtain a spectrum at an apodized resolution of 
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Figure 2 | Experimental setup for dual-comb CARS spectro-imaging. See Methods for details. The colour code is consistent with that of Fig. 1. Ti-Sa, 


titanium-sapphire. 
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Figure 3 | High-resolution dual-comb CARS of a mixture of liquid 
chemicals. a, Unaveraged interferogram showing the non-resonant 
interference signal around the zero time delay and the interferometric 
modulation of the vibrational transitions shown in ¢ (6f = 5 Hz; energy per 
pulse 3 nJ). b, Dual-comb CARS unaveraged spectrum (df = 100 Hz; 
measurement time 14.8 ks; apodized resolution 4 cm '; energy per pulse 3 nJ). 
c, Dual-comb CARS unaveraged spectrum (6f = 5 Hz; measurement time 
295.5 us; apodized resolution 4cm_'; energy per pulse 3 nJ). d, Dual-comb 
CARS unaveraged spectrum (6f = 5 Hz; measurement time 295.5 1s; apodized 
resolution 4cm_'; energy per pulse 0.5 nJ). The insets in b and c magnify the 
vertical scale tenfold. 


10cm. The total measurement time of 40.5s for the 45 X 45 um? 
hyperspectral image corresponds to an acquisition rate of 50 pixels s |; 
it is limited by the refresh rate of the interferograms, although the entire 
sampling time of the interferograms—which are Fourier transformed to 
give the spectral hypercube in Fig. 4—lasts only 24.3 ms. 

Taken together, the above proof-of-principle experiments dem- 
onstrate the potential of our method for the rapid acquisition of both 
high-resolution spectra spanning a broad bandwidth and hyperspec- 
tral images of vibrational transitions. A unique feature of our tech- 
nique is its multiplex nature: all the spectral elements are measured at 
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Figure 4 | Hyperspectral image of a capillary plate with holes filled with a 
chemical mixture. Each of the 2,025 pixels of the hyperspectral cube 
corresponds to a spectrum at 10cm‘ apodized resolution measured within 
12 1s at a fixed spatial location and provides the spectral signature of 
compounds present in this part of the sample. Scale numbers on the images 
indicate pixels; the spectrum shown in the centre corresponds to pixel (21,16). 
Each spectral element of the cube may be plotted as an image (that is, intensity 
for all pixels at a fixed wavenumber) similar to the four that are shown and 
provides the spatial quantitative distribution of a given compound with a 
distinguishable spectral signature at that wavenumber. 


the same time on a single photodetector, which ensures consistency of 
the spectra. Moreover, the frequency combs guarantee the reproducib- 
ility and accuracy of the wavenumber scale. However, a major limitation 
in the present proof-of-principle experiments is the low duty cycle (the 
ratio between the time it takes to measure an interferogram and the time 
before the next interferogram is measured), which for the spectro- 
imaging experiments shown in Fig. 4 is only 6 X 10 *. The interfero- 
gram refresh time is the inverse of the difference of the laser repetition 
frequencies 1/8f, whereas the spectral information is only collected when 
the delay between the pairs of pulses is shorter than the coherent 
molecular ringing time (see Supplementary Information for more 
detailed discussion). One solution would be to use combs with a larger 
line spacing, which could allow the duty cycle of interferogram acquisi- 
tion to approach unity while keeping measurement times and signal-to- 
noise ratios similar to those in Figs 3 and 4. Frequency comb generators 
based on solid-state lasers with a short cavity™* or on chip-scale micro- 
resonators” might offer a route towards realizing such high duty cycle 
experiments. For microscopy experiments, scanning the laser beam with 
a galvanometric mirror rather than the sample stage provides a straight- 
forward way to speed up the mapping process. High-speed cameras 
(more than 10° framess ') could even allow real-time hyperspectral 
dual-comb CARS imaging. There is also scope for improvements that 
would affect the signal itself: For example, a more sophisticated disper- 
sion management, particularly of third-order dispersion, would enhance 
the signal-to-noise ratio. This could be complemented with fast syn- 
chronous or differential detection schemes that might further decrease 
the noise level. Finally, few-cycle oscillators or spectral broadening with 
nonlinear fibres will expand the spectral span of the setup. 
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Several schemes exploiting coherent Raman scattering for novel spec- 
troscopy and microscopy applications have recently emerged, and we 
expect that their combination with our method will deliver techniques 
with improved performance and utility. For example, our dual-comb 
approach could benefit surface-enhanced**’”” CARS measurements or 
studies of Raman optical activity*®. Moreover, exciting imaging capabil- 
ities might arise when extending our method to exploit either near-field 
effects (for example at a metal tip’’) or far-field effects (for example state 
depletion*’) to achieve sub-wavelength spatial resolution. 


METHODS SUMMARY 


Two titanium-sapphire lasers (Synergy20 UHP; Femtolasers) emit 20-fs pulses 
centred at 12,580cm ' with energies up to 13 nJ. Both have adjustable repetition 
frequencies of about 100 MHz. The beams of the two lasers (Fig. 2) are combined 
on a beamsplitter, and a chirped mirror compressor (Layertec) compensates for 
the second-order dispersion induced by the optical components of the setup. 
Spectral filtering is applied to improve the signal-to-background ratio. A low- 
frequency-pass filter (ET750LP, cutoff 13,330 cm~ 1. Chroma Technology) before 
the sample and a high-frequency-pass filter (3RD740SP, cutoff 13,510cm '; 
Omega Optical Inc.) after the sample isolate the CARS signal that is generated 
by the sample after proper focusing with a lens or a microscope objective. The 
spectral span is thus limited on the low-energy side by the optical filters and on the 
high-energy side by the spectral bandwidth of the femtosecond lasers. The anti- 
Stokes radiation is forward-collected and focused on a silicon photodiode with a 
frequency bandwidth of the order of 100 MHz. The electric signal is low-pass 
filtered to 50 MHz to avoid aliasing. It is then amplified and digitized with a 16- 
bit data acquisition board (1.8 X 10° samples s_', ATS9462; Alazartech). 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Detailed experimental setup. Two titanium-sapphire lasers (Synergy20 UHP; 
Femtolasers) emit 20-fs pulses centred at 12,580cm7' with energies up to 13 nJ. 
Titanium-sapphire lasers are chosen because of their capabilities to generate ultra- 
short pulses in a spectral region where most samples have no or only weak absorp- 
tion and where advanced photonics tools are available. Both oscillators have 
repetition frequencies of about 100 MHz, which can be adjusted by moving a 
cavity mirror mounted on a motorized translation stage and a piezoelectric trans- 
ducer. The repetition frequencies are monitored with fast silicon photodiodes 
connected to frequency counters (53131A; Agilent). To prevent long-term drifts, 
the repetition frequency of each laser comb is stabilized against a radiofrequency 
clock by means of a mirror of the laser’s cavity mounted on a piezoelectric trans- 
ducer. This does not affect the quality of an individual spectrum but improves the 
reproducibility of the wavenumber scale of a sequence of spectra. The laser beams 
are linearly polarized. The pulse energy available for the spectroscopy experiments 
is adjusted for each laser beam individually with a combination of a half-wave plate 
and a polarizer. The beams of the two lasers are combined (Fig. 2) on a pellicle 
beamsplitter, and a chirped mirror compressor (Layertec) compensates for the 
second-order dispersion induced by the optical components of the setup. Spectral 
filtering is applied to improve the signal-to-background ratio. A low-frequency-pass 
optical filter (ET750LP, cutoff 13,330 cm}; Chroma Technology) before the sample 
and a high-frequency-pass optical filter (3RD740SP, cutoff 13,510cm'; Omega 
Optical Inc.) after the sample isolate the CARS signal that is generated by the sample 
after proper focusing with a lens or a microscope objective. The spectral span is thus 
limited on the low-energy side by the optical filters and on the high-energy side by 
the spectral bandwidth of the femtosecond lasers. The anti-Stokes radiation is 
forward-collected and focused on a single silicon photodiode with a frequency 
bandwidth of the order of 100 MHz. The electric signal is low-pass filtered to 
50 MHz to avoid aliasing. The non-interferometric signal, which occurs at the pulse 
repetition frequency, is also filtered out. This non-interferometric signal (CARS and 
non-resonant signal within a single laser pulse) is the main source of undesired 
background. The interferometric signal is then amplified with a wideband variable- 
gain voltage amplifier (DHPVA-100; FEMTO Messtechnik GmbH) and digitized 
with a 16-bit data acquisition board (1.8 X 10° samples s |, ATS9462; Alazartech). 
Apodization and Fourier transformation may be accomplished in real time with the 
use of field-programmable gate arrays or a posteriori with a basic desktop computer. 

The data of Fig. 3 were recorded with orthogonal linear polarizations of the two 
laser beams. This decreased the interferometric non-resonant background, while 
the fast depolarization of the sample maintained the strength of the anti-Stokes 
signal. For the spectra of Fig. 3b, c, the focusing optics consisted of lenses with a 
focal length of 20mm and required an amount of dispersion compensation of 
—600 fs”. The pulse energy at the sample was 3 nJ. The spectrum of Fig. 3d was 
measured with a focusing lens with a focal length of 8 mm, a pulse energy of 0.5 nJ 
and an avalanche photodetector (APD Module C4777; Hamamatsu). 

To record the hyperspectral images (Fig. 4), the difference in repetition frequen- 
cies of the two combs was set to 50 Hz and a microscope objective (LCPLN20XIR; 
Olympus) focused the beams on the sample, with a beam diameter of 1.9 um anda 
Rayleigh length of 3.4 um. The pulse energy at the sample was 3.8 nJ and a second- 
order dispersion of —3,000 fs* was compensated for. The sample was mounted on 
a motorized x-y platform (MLS203, Thorlabs) to raster scans across the sample in 
1-p1m steps. 

Choice of recording parameters. In our experiments of coherent anti-Stokes 
Raman spectroscopy with two laser frequency combs, the vibrational levels that 
are excited in a Raman two-photon process have an energy fyi, that matches a 
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frequency difference between pairs of lines of one comb. The vibration is modu- 
lated by the interference between the excitation by the two combs and is thus 
downconverted to the frequency fii, df/f. 

The energy frib,max (in hertz) of the higher-lying vibrational level that can be 
observed is about the spectral bandwidth of the combs AF (in hertz), roughly twice 
the full width at half-maximum of the laser spectrum: 


Frib,max = AF 


The difference dfin repetition frequencies between the two lasers must be adjusted 
to image the domain 0-AF into a radiofrequency free spectral range of 0-f/2 at 
most. Thus, to avoid aliasing, one has to choose 6f= fl (2AF). For the faster 
recording times, 5f should be chosen equal to f-/(2AF). However, for signal-to- 
noise ratio improvement, it may be advantageous to set df to a lower value, as 
illustrated in Fig. 3. 

The Fourier transform of the interferometric signal provides a radiofrequency 
spectrum with a free spectral range equal to AF6f/f (and less than or equal to f/2). 
The Raman-shift scale is retrieved by dividing the measured radiofrequency scale 
by the downconversion factor df/f. 

The resolution Sy; (in hertz) in the radiofrequency domain, in a magnitude 
spectrum with triangular apodization, is given by the inverse of the measurement 
time T: 


OVeE = 1.8/T 


The optical resolution dv, (in hertz) of the Raman spectrum is retrieved by 
dividing the radiofrequency resolution 5v,¢ by the downconversion factor df/f: 


SVope = L.8fl(8fT) 


The instrumental resolution is fundamentally limited by the line spacing f of the 
comb. For most mode-locked lasers the line spacing is within the range 50 MHz to 
1 GHz. Thus this limitation is not an issue in most liquid-phase studies, because 
the width of the vibrational bands is generally broader than 100 GHz (3.3 cm— 1) It 
is, however, possible to improve the resolution by interleaving successively 
acquired spectra recorded with slightly different radiofrequency line spacings. 

The difference df in repetition frequencies between the two lasers also deter- 
mines the interferogram refresh rate. Every 1/df the pulse train of one laser scans 
the entire pulse period of the second laser comb to generate a single interferogram, 
which is afterwards time-windowed to provide the desired resolution. This refresh 
rate limits the speed of successive acquisitions and thus is currently the main 
limitation for fast hyperspectral imaging experiments. A detailed discussion in 
Supplementary Information elaborates on this difficulty and shows that frequency 
combs of a relatively large line spacing (1 GHz) hold promise to overcome it. 

In our experiment we use lasers with a repetition frequency of about 
f= 100 MHz. When the difference of repetition frequencies of the two lasers is 
set to 8f= 100 Hz, the downconversion factor Sf/f is 10°. The radiofrequency 
free-spectral range is 50 MHz and the optical free spectral range AF is 50 THz 
(1,668 cm '). A recording time of 15 1s leads to a radiofrequency resolution 5v,,of 
120 kHz, which converts to an optical resolution 5v,,; of 120 GHz (4 cm '). The 
refresh time of the interferograms is 10 ms. 
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Nucleation of aerosol particles from trace atmospheric vapours is 
thought to provide up to half of global cloud condensation nuclei’. 
Aerosols can cause a net cooling of climate by scattering sunlight 
and by leading to smaller but more numerous cloud droplets, 
which makes clouds brighter and extends their lifetimes’. 
Atmospheric aerosols derived from human activities are thought 
to have compensated for a large fraction of the warming caused by 
greenhouse gases’. However, despite its importance for climate, 
atmospheric nucleation is poorly understood. Recently, it has been 
shown that sulphuric acid and ammonia cannot explain particle 
formation rates observed in the lower atmosphere’. It is thought 
that amines may enhance nucleation*'*, but until now there has 
been no direct evidence for amine ternary nucleation under atmo- 
spheric conditions. Here we use the CLOUD (Cosmics Leaving 
OUtdoor Droplets) chamber at CERN and find that dimethyl- 
amine above three parts per trillion by volume can enhance particle 
formation rates more than 1,000-fold compared with ammonia, 
sufficient to account for the particle formation rates observed in 
the atmosphere. Molecular analysis of the clusters reveals that the 
faster nucleation is explained by a base-stabilization mechanism 
involving acid-amine pairs, which strongly decrease evaporation. 
The ion-induced contribution is generally small, reflecting the 
high stability of sulphuric acid-dimethylamine clusters and indi- 
cating that galactic cosmic rays exert only a small influence on their 
formation, except at low overall formation rates. Our experimental 
measurements are well reproduced by a dynamical model based on 
quantum chemical calculations of binding energies of molecular 
clusters, without any fitted parameters. These results show that, in 
regions of the atmosphere near amine sources, both amines and 
sulphur dioxide should be considered when assessing the impact of 
anthropogenic activities on particle formation. 

The primary vapour responsible for atmospheric nucleation is 
thought to be sulphuric acid (H2SO,), derived from the oxidation of 
sulphur dioxide. However, peak daytime H,SO, concentrations in the 


atmospheric boundary layer are about 10° to 3 X 10’ cm’ * (0.04-1.2 
parts per trillion by volume (p.p.t.v.)), which results in negligible bin- 
ary homogeneous nucleation of H,SO,-H,0 (ref. 3). Additional spe- 
cies such as ammonia or amines*” are therefore necessary to stabilize 
the embryonic clusters and decrease evaporation. However, ammonia 
cannot account for particle formation rates observed in the boundary 
layer’ and, despite numerous field and laboratory studies*"*, amine 
ternary nucleation has not yet been observed under atmospheric con- 
ditions. Amine emissions are dominated by anthropogenic activities 
(mainly animal husbandry), but about 30% of emissions are thought to 
arise from the breakdown of organic matter in the oceans, and 20% 
from biomass burning and soil*’’. Atmospheric measurements of gas- 
phase amines are sparse, but typical values range between negligible 
and a few tens of p.p.t.v. per amine species'”°. 

Here we report results from the CLOUD experiment at CERN 
(for experimental details see Methods, Extended Data Fig. 1 and 
Supplementary Information). The data were obtained during three 
campaigns at the CERN Proton Synchrotron between October 2010 
and November 2012, and comprise measurements of sulphuric acid- 
amine nucleation at atmospheric concentrations. Dimethylamine 
(DMA; C,H7N) was selected for this study because it is expected to 
have cluster binding energies representative of other light alkyl amines’. 

Nucleation rates J were measured under neutral (J,), galactic cosmic 
ray Ugcr) and nm’ beam (J,) conditions, corresponding to ion-pair 
concentrations of about 0, 650 and 3,000cm °, respectively. Both 
Jecr and J, comprise the sum of neutral and ion-induced nucleation 
rates, whereas J,, measures the neutral rate alone. Figure 1 shows the 
nucleation rates at 1.7 nm mobility diameter (1.4nm mass diameter) 
as a function of [H2SO,] for ‘binary’ (H,SO,—-H20), ammonia ternary 
(H,SO,-NH3-H,O) and amine ternary (H,SO,-DMA-H,0) nuc- 
leation at 278K and 38% relative humidity (RH). Here ‘binary’ 
includes previous measurements made in the presence of NH; and 
DMA contaminants’, estimated from later campaigns to be <2 p.p.tv. 
and <0.1 p.p.t.v., respectively, for the conditions of ref. 3. Nucleation 
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Atmospheric observations: 
Hohenpeissenberg, Germany?" (mountain, meadow, forest) 
Hyytiala, Finland”? (boreal forest) 
| Hyytiala, Finland?" (boreal forest) 
| Melpitz, Germany?’ (rural, agricultural, livestock) 
| San Pietro Capofiume, Italy?" (industrial, agricultural, livestock) 
Tecamac, Mexico? (urban) 
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Figure 1 | Plot of experimental, atmospheric and theoretical nucleation 
rates against H,SO, concentration. Observations in the atmospheric 
boundary layer are indicated by small coloured squares*!”’. The CLOUD data, 
recorded at 38% RH and 278 K, show J,,, with only H,SO,, water and 
contaminants (<0.1 p.p.t.v. DMA and <2 p.p.t.v. NH3) in the chamber (open 
black circles, curve 1); Jgcr with <0.1 p.p.t.v. DMA and 2-250 p.p.t.v. NH 
(coloured triangles, curve 2); and J, Jgcr and J, with 10 p.p.t-v. NH3 and 

3-5 p.p.t.v. DMA (coloured circles, curve 3), 5-13 p.p.t.v. DMA (coloured 
circles, curve 4) and 13-140 p.p.t.v. DMA (coloured circles, curve 5). The 
mixing ratios of NH3 or DMA are indicated by a colour scale. The curves are 
drawn to guide the eye; the straight sections follow power laws, J x [H2SO,]”, 
with fitted slopes n of 3.6 + 0.5 (curve 1), 2.7 + 0.1 (curve 2), 5.0 + 0.8 (curve 
3), 3.6 + 0.2 (curve 4) and 3.7 + 0.1 (curve 5). The flattening of curves 1 and 2 at 
higher [H,SO,] results from saturation of the ion production rate and also a 
decreasing contribution of ammonia ternary nucleation. The bars indicate lo 
total errors, although the overall factor 2 systematic scale uncertainty on 
[H2SO,4] is not shown. Theoretical expectations (ACDC model) are indicated 
for H,SO, nucleation with 10 p.p.t.v. NH (dashed blue line and blue band) and 
for 10 p.p.t.v. DMA plus 10 p.p.t.v. NH3 (dashed red line and orange band, 
assuming a sticking probability of 0.5 for neutral-neutral collisions and 1.0 for 
charged-neutral collisions). The bands correspond to the uncertainty range of 
the theory: +1 and —1 kcal mol”! binding energy (blue band) and sticking 
probabilities for neutral—neutral collisions between 0.1 and 1.0 (orange band), 
for the lower and upper limits, respectively. 


rates with 5 p.p.t.v. DMA are enhanced more than 1,000-fold com- 
pared with 250p.p.t.v. ammonia (Fig. 1). Additional DMA up to 
140 p.p.t.v. results in a less than threefold further rate increase, indi- 
cating that amine levels of about 5 p.p.t.v. are sufficient to reach the rate 
limit for amine ternary nucleation under atmospheric conditions 
(({H,SO,] = 3 X 10’ cm *, or 1.2 p.p.tv.). 

The amine ternary nucleation rates pass through the band of atmo- 
spheric observations (Fig. 1). However, the latter reveal distinct 
regional differences, with some environments showing nucleation 
rates both above and below the amine limit (boreal forest and moun- 
tain*'**), whereas others are only below the limit (agricultural, live- 
stock, industrial and urban*’*). This suggests that nucleation in 
different regions of the boundary layer may be controlled by different 
ternary vapours. In regions where amines are likely to be present 
(livestock farming and urban), the atmospheric rates are compatible 
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with amine nucleation. However, the atmospheric data show consid- 
erable variability, probably resulting from variations in ternary vapour 
concentrations and particle coagulation sinks. When growth rates are 
low, the measured nucleation rates are highly sensitive to particle 
coagulation sinks, which influence particle losses both above and 
below the quoted formation threshold sizes. Losses below the thresh- 
old size are uncorrected, implying higher variability in the atmosphere, 
where conditions are less well defined than in the laboratory. 

Figure 1 shows the theoretical expectations for NH; (blue band) 
and DMA ternary nucleation (orange band), obtained with the 
Atmospheric Cluster Dynamics Code model (ACDC)** (see Methods 
and Supplementary Information for further details). The model uses 
cluster evaporation and fragmentation rates calculated from quantum 
chemistry, with no fitted parameters”. The agreement is quite good, 
although the model predicts somewhat higher DMA ternary nucleation 
rates than measured experimentally. Part of this discrepancy is due to 
the smaller size—and hence higher formation rate—of the modelled 
clusters (up to four acid and four base molecules per cluster, corres- 
ponding to mobility diameters of 1.2-1.4nm). Computational studies 
(see Supplementary Information and Extended Data Figs 2 and 3) 
indicate that DMA ternary nucleation is rather insensitive to RH or 
temperature, reflecting the strong acid-base binding. The experimental 
measurements obtained at 38% RH and 278 K may therefore be con- 
sidered representative of a wide range of boundary layer conditions. 

Plots of the nucleation rates J, Jgcr and J, against DMA mixing ratio 
are shown in Fig. 2a. Here, all measurements have been scaled to 
[H>SO,] = 2.0 X 10°cm~* using the fitted slopes, n, from Fig. 1. The 
addition of only 5 p.p.t.v. DMA enhances the nucleation rate of sul- 
phuric acid particles by more than six orders of magnitude, but the 
addition of further DMA up to 140 p.p.t.v. produces a negligible fur- 
ther increase. The measured neutral, galactic cosmic ray (GCR) and 
beam nucleation rates are indistinguishable, within experimental 
uncertainties. However, a more sensitive determination of the ion- 
induced nucleation rate, Jin = Jit, + Jim» is obtained from direct ion 
measurements with the neutral cluster and air ion spectrometer. The 
ion-induced fractions, Jiin/Jgcr OF Jiin/Jx (Fig. 2b), are found to average 
about 20% at 0.5cm *s ' but grow in relative importance as the total 
nucleation rate decreases. This indicates that the influence of galactic 
cosmic rays on the nucleation of sulphuric acid—amine particles is only 
significant at low overall formation rates. No difference is measured for 
the ion-induced fraction under GCR or beam conditions (Fig. 2b). 
This follows, because the ion-ion recombination lifetimes are below 
10 min and are comparable to the monomer arrival rate on the cluster 
(one molecule per 12 min for H»SO4*HSO,” at 10°cm™* [H,SO,]). 
Consequently, although the ion pair concentration is larger for beam 
conditions, it is compensated for by a shorter ion lifetime, which 
decreases the time available for nucleation before the ion cluster is 
neutralized. 

Figure 3 shows the molecular composition of nucleating charged 
clusters in the presence of DMA for negative ions (Fig. 3a) and positive 
ions (Fig. 3b), measured with atmospheric-pressure interface time-of- 
flight mass spectrometers (APi-TOFs). The predominant negatively 
charged clusters include an HSO, or HSO;_ ion. The latter is depro- 
tonated peroxysulphuric acid, whose presence varies with the ozone 
concentration in the chamber (it is absent when no ozone is present). 
We found no indication that the nucleation rates are sensitive to the 
relative contribution of these ion species. Contaminant NO; ions are 
also detected, but at much lower concentrations. The predominant 
positively charged clusters contain a protonated DMA ion, DMA*H* 
(C>H;N'H"), in association with H,SO, and DMA. The remaining 
positive ions are largely protonated light organic contaminants, mostly 
also nitrogen-containing. 

Amine ternary nucleation is observed to proceed by the same base- 
stabilization mechanism as that found previously for ammonia ternary 
nucleation®. We will use the label (n,m) to indicate the number 
of sulphuric acid (nSA) and DMA (mDMA) molecules in pure 
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Figure 2 | Contribution of DMA and ions to amine ternary nucleation. 
Measurements recorded at 38% RH and 278 K. a, Nucleation rates, J, Jeex and 
Jn as a function of DMA mixing ratio. b, lon-induced fractions, Jiin/Jgcr and 
Jin! Jn) as a function of Jy, or Jz, at DMA = 3-140 p.p.tv. In a, all nucleation 
rates are scaled to [H2SO,] = 2.0 X 10°cm’ * (0.08 p-p.-t.v.) using the fitted 
slopes in Fig. 1. The point at 0.1 p.p.t-v. DMA shows the mean projected Jc, 
measurement at contaminant-level DMA and NH3. The bars indicate lo total 
errors and include correlated systematic contributions. Theoretical 
expectations are shown by dashed red lines (sticking probability of 0.5 for 
neutral—neutral collisions and 1.0 for charged-neutral collisions) and 
uncertainties by orange bands (sticking probabilities for neutral—neutral 
collisions between 0.1 and 1.0). 


SA‘DMA clusters, where n and m include both neutral and charged 
species. Negatively charged nucleation (Fig. 3a) proceeds as follows. 
The first step is dimer (2,0) formation: HSO4 *H2SO, (for simplicity 
the ‘HSO, ’ ion implies either HSO, or HSO;_ ). This constitutes an 
acid-base pair because HSO, is a Lewis base (an electron pair donor). 
Consequently the first negatively charged cluster to which DMA can 
bind to form an acid-base pair is the acid trimer. The most abundant 
acid trimer contains two DMA molecules (3, 2). Thereafter, each addi- 
tional acid molecule is stabilized by one additional DMA molecule, 
following a sequence of acid-base pairs: (3, 2) > (4,3) > (5,4) — (n, 
n— 1). Our calculations suggest that the process involves mainly the 
accretion of SA‘-DMA (dimethylaminium bisulphate) clusters, but it 
may also involve the stepwise addition of an SA molecule followed by a 
DMA molecule. Beyond (7,6) clusters, there is evidence for further 
neutralization of the acid by additional DMA (partial formation of 
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Figure 3 | Mass and molecular composition of charged clusters during a 
nucleation event with DMA. Molecular composition of charged clusters 
measured by the APi-TOF for Jgc, = 1.2 cm °s 1, 4.0 10°cm * [H2SO,], 
11 p.p.t.v. NH3, 9.4 p.p.t.v. DMA, 38% RH and 278 K. a, Negative particles. 

b, Positive particles. Cluster mass/charge, m/z, defect (difference from integer 
m/z) is plotted against m/z; each circle represents a distinct molecular 
composition and its area represents countss 1. The labels (n, m) indicate the 
number of sulphuric acid (nSA) and DMA (mDMA) molecules in pure clusters 
of SA and DMA, including both neutral and charged species. The addition of a 
single SA (H2SO,4) or DMA (C2H7N) molecule to any cluster displaces it on the 
plot by a vector distance indicated by the grey arrows in b. Red circles represent 
pure SA clusters; green circles are clusters containing SA and DMA alone; black 
circles contain ammonia in addition (only appearing in some clusters above 
m/z = 900); other clusters (mostly containing light organic contaminants) are 
grey circles. Water molecules evaporate rapidly in the APi-TOF and are not 
detected (see Supplementary Information). 


dimethylaminium sulphate). Positively charged nucleation (Fig. 3b) 
proceeds similarly. Here DMA-H_" is a Lewis acid and so binds only 
weakly with H,SO,4. Hence the first positively charged cluster is a 
DMA-H_" ion together with a single SA*‘DMA acid-base pair (1, 2). 
Thereafter, the cluster grows by the accretion of SA‘DMA pairs, 
exactly as seen for negatively charged clusters. No DMA‘H* mono- 
mer is detected because its mass-to-charge ratio (m/z), 46, is below the 
APi-TOF cutoff, as configured for these experiments. 

Because both HSO, and DMA are Lewis bases, each can form an 
acid-base pair with H2SO,. In fact HSO, is the stronger base, as 
demonstrated by its much stronger binding energy with H2SO, 
(Supplementary Table 1)*. The only fundamental difference is that 
not more than one HSO, ion can be present in the cluster because 
of electrostatic repulsion. So, although the APi-TOF measures only 
charged clusters in the CLOUD chamber, this suggests that neutral 
nucleation proceeds by the same mechanism, namely the initial forma- 
tion of an acid-base pair (SA* DMA)—equivalent to the acid-base pair 
(SA*HSO,_ ) seen in charged nucleation (Fig. 3a)—and subsequently 
the accretion of additional SA‘DMA pairs. This is also indicated 
by the Atmospheric Cluster Dynamics Code (ACDC) model (see 
Supplementary Information and Extended Data Fig. 4). 

There is direct experimental evidence to support this picture of the 
neutral nucleation mechanism. Figure 4 shows a plot of the concentration 
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Figure 4 | Plot of neutral H,SO, dimer against monomer concentrations 
before and after the addition of DMA. Concentrations were measured by the 
CIMS in CLOUD without DMA (open circles) and with 3-140 p.p.t.v. DMA 
and 10 p.p.t.v. NH; (coloured circles), at 38% RH and 278 K. Ions are absent 
from the CLOUD chamber (the clearing field is on). The bars indicate 1o 
counting errors. The fitted red curve through the DMA data shows a quadratic 
dependence on monomer concentration. The other curves show the expected 
neutral dimer concentrations for the binary H,SO4-H20 system (short-dashed 
black line)’*, for production in the CIMS ion source (dashed black line and grey 
uncertainty band) and for 10 p.p.t.v. DMA in the ACDC model, assuming 0.5 
sticking probability (dashed red line). The orange band shows the model 
uncertainty range (sticking probabilities between 0.1 and 1.0). The brown curve 
indicates the upper limit of the dimer concentration calculated with the ACDC 
model, which is close to the kinetic limit (unit sticking probability and 
negligible evaporation). 


of the neutral acid dimer against that of the neutral acid monomer, 
measured with the chemical ionization mass spectrometer (CIMS) before 
and after the addition of DMA, when the clearing field was present 
(implying that there were only neutral clusters in the CLOUD chamber). 
We infer from the observed absence of DMA on the negatively charged 
monomer or dimer (Fig. 3a) that, after charging in the CIMS, clusters 
containing one H,SO, molecule will be detected as DMA-free charged 
monomers, and clusters containing two H,SO, molecules will be detected 
as DMA-free charged dimers—regardless of whether or not they were 
originally clustered with DMA. Before adding any DMA, the dimer 
concentrations are consistent with the expected production in the 
CIMS ion source. However, with 5 p.p.t.v. DMA or more, the dimer 
concentrations are about six orders of magnitude higher than those 
expected for a pure binary system”. The concentration of neutral acid 
dimer with DMA approaches the kinetic limit, indicating highly stable 
clusters with negligible evaporation, and supporting the neutral nuc- 
leation mechanism inferred above. 

A previous experiment” measured unexpectedly high dimer con- 
centrations in a laminar flow tube and concluded that a stabilizing 
contaminant must be present, although none was measured. This was 
proposed” to explain the high ‘binary’ nucleation rates previously 
measured in ref. 28 in the same flow tube. Another experiment’ 
measured high dimer formation rates linked to amine mixing ratios 
of about 1 part per billion by volume and above. Our results directly 
link high concentrations of neutral H.SO, dimer with amines at atmo- 
spheric levels. 

The results reported here show that nucleation in the atmospheric 
boundary layer is highly sensitive to trace amine levels of only a few 
p.p.t.v. Sulphuric acid-amine nucleation is found to proceed by the 
same base-stabilization mechanism as that previously observed for 
ammonia, in which each additional acid molecule in the cluster is 
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stabilized by one (or occasionally, two) base molecules’. However, 
the acid-base pairs that sulphuric acid forms with amines are more 
tightly bound than with ammonia, resulting in cluster formation rates 
that approach the kinetic limit. Little increase is seen above 5 p.p.tv. 
DMA, indicating that nucleation at atmospheric H2SO,4 concentra- 
tions (below 3 X 10’cm™? or 1.2 p.p.t.v.) is then limited by the avail- 
ability of H,SO, and not that of DMA. Our experimental rate and 
molecular measurements are well reproduced by a dynamical model 
based on quantum chemical calculations of binding energies of 
molecular clusters. 

Although measurements of ambient gas-phase amines are rare, 
mixing ratios of a few p.p.t.v. in the continental boundary layer have 
been reported'”””°, suggesting that sulphuric acid-amine nucleation 
is likely to be an important atmospheric process. However, atmo- 
spheric observations indicate both the presence’*'”"* and the absence” 
of a significant amine fraction in newly formed particles, which sug- 
gests considerable variability of ambient amine levels. Although 
amines are volatile vapours, our measurements show that sulphate 
particles constitute an almost perfect sink (negligible evaporation). 
However, unlike H,SO,, amine vapours are directly emitted from 
sources in their chemically active form and so they will be localized 
to source regions, with highly variable concentrations that depend on 
ambient sulphate particle sinks and OH radical levels (the DMA oxida- 
tion lifetime is about 4h at 10°cm * [OH]). Amines can therefore 
explain only a part of atmospheric nucleation. Indeed, our measure- 
ments leave open the possibility that nucleation may also proceed with 
other atmospheric vapours, such as highly oxidized organic species of 
very low volatility. In such cases, extremely low amine concentrations 
may still enhance nucleation by forming stable acid-base pairs with 
some fraction of the sulphuric acid molecules in an embryonic cluster 
(constituting at least four-component nucleation). 

The ion-induced contribution to amine ternary nucleation is gen- 
erally small, except at low overall formation rates. Ions can enhance 
nucleation either by an increased collision rate between a charged 
cluster and polar molecules (such as H,SO4 or H,SO4*DMA) or by 
an increased cluster binding energy (and hence decreased evaporation 
rate). Because neutral clusters of H2SO, and DMA are highly stable, 
charge offers little competitive advantage. Taken together with pre- 
vious CLOUD measurements’, this suggests that ions can be signifi- 
cant in atmospheric particle formation provided that the associated 
neutral particles have appreciable evaporation and provided that the 
overall nucleation rates are low and below the ion-pair production rate. 

The Intergovernmental Panel on Climate Change (IPCC) considers 
that the increased amount of aerosol in the atmosphere from human 
activities constitutes the largest present uncertainty in climate radiative 
forcing’ and projected climate change this century”. The results 
reported here show that the uncertainty is even greater than previously 
thought, because extremely low amine emissions—which have sub- 
stantial anthropogenic sources and have not hitherto been considered 
by the IPCC—have a large influence on the nucleation of sulphuric 
acid particles. Moreover, amine scrubbing is likely to become the 
dominant technology for CO, capture from fossil-fuelled power 
plants, so anthropogenic amine emissions are expected to increase in 
the future*®. If amine emissions were to spread into pristine regions of 
the boundary layer where they could switch on nucleation, substantial 
increases in regional and global cloud condensation nuclei could 
occur. This underscores the importance of monitoring amine emis- 
sions—as well as those of sulphur dioxide—when assessing the impact 
of anthropogenic activities on the radiative forcing of regional and 
global climate by aerosols. 


METHODS SUMMARY 


CLOUD is designed to study the effects of cosmic rays on aerosols, cloud droplets 
and ice particles, under precisely controlled laboratory conditions. The CLOUD 
chamber and gas system have been built to the highest technical standards of 
cleanliness and performance. Owing to its large volume (26 m*) and highly stable 


©2013 Macmillan Publishers Limited. All rights reserved 


operating conditions, the chamber allows nucleation rates to be measured reliably 
over a wide range from 0.001 cm *s_' to wellabove 100cm~*s_ *. The loss rate of 
condensable vapours onto the walls of the chamber is comparable to the con- 
densation sink rate onto ambient aerosols under pristine atmospheric boundary 
layer conditions. The experiment has several unique aspects, including precise 
control of the ‘cosmic ray’ beam intensity from the CERN Proton Synchrotron, 
the capability to create an ion-free environment with an internal electric clearing 
field, precise and uniform adjustment of the H,SO, concentration by means of 
ultraviolet illumination from a fibre-optic system, and highly stable operation at 
any temperature between 203 and 300K. The contents of the chamber are con- 
tinuously analysed by a suite of instruments connected to sampling probes that 
project into the chamber. 

The experimental measurements are compared with theoretical expectations 
based on a dynamical model in which collision and coagulation rates are computed 
from kinetic gas theory. Equilibrium constants are computed from quantum 
chemical calculations of binding energies of molecular clusters, and evaporation 
and cluster fission rates are then obtained from detailed balance. All possible 
cluster-cluster processes are included. The electrostatic enhancement of ion- 
molecule collisions is calculated by using dipole moments and polarizabilities 
obtained from quantum chemistry. The model has no fitted parameters. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


The key features of the CLOUD experiment (Extended Data Fig. 1) are a large- 
volume (26 m*) stainless steel chamber; precise (+0.01 K) temperature control at 
any tropospheric temperature; precise delivery of selected trace gases and ultra- 
pure humidified synthetic air; precise and uniform adjustment of the H2SO4 
concentration by means of ultraviolet illumination from a fibre-optic system; 
suppression of contaminant vapours at the technological limit; an adjustable =* 
beam from the CERN Proton Synchrotron to simulate cosmic rays; and the ability 
to simulate an ion-free environment by applying an electric field to sweep ions 
from the chamber. 

A comprehensive array of state-of-the-art instruments continuously samples 
and analyses the contents of the chamber. For the results reported here, the 
instruments included a chemical ionization mass spectrometer for HSO4 con- 
centration*’, two APi-TOFs (TOFWERK AG and Aerodyne Research, Inc.)* for 
the molecular composition of positive and negative charged clusters, several con- 
densation particle counters (CPCs) with 50% detection efficiency thresholds near 
2 nm (two Airmodus A09 particle size magnifiers (PSMs)”’, two diethylene glycol 
CPCs (DEG-CPCs)****, a TSI 3776 CPC and a TSI 3786 CPC), a scanning mobility 
particle sizer (SMPS), a neutral cluster and air ion spectrometer (NAIS)**, a proton 
transfer reaction time-of-flight mass spectrometer for organic vapours’, and an 
ion chromatograph for measurements of NH; and DMA concentrations”. 

Two particle counters were operated in a continuously stepped scanning mode 
to provide measurements of particle growth rates at small sizes: first, a PSM whose 
detection threshold was varied between about 1 and 2.5 nm, and second, the TSI 
3786 with a variable laminar flow rate through its sampling probe, leading to a 
cutoff size between about 2.5 and 5 nm. The H,SO, concentration is derived from 
channels 97 (HSO,_) and 160 (HNO3°*HSO,_) of the CIMS, which measure the 
sulphuric acid monomer signal after charging in the CIMS ion source. The sul- 
phuric acid dimer concentration measured by the CIMS is derived assuming the 
same calibration factor as for monomers. 

Nucleation rates Jy, Jgcr and Jy, (cm *s_') were measured as follows. Neutral 
nucleation rates are measured with no pion beam and with the field cage electrodes 
set to +30 kV, which establishes an electric field of about 20kV m! in the cham- 
ber. This completely suppresses ion-induced nucleation because, under these 
conditions, small ions or molecular clusters are swept from the chamber in about 
1s. Because all of the nucleation processes under consideration take place on 
substantially longer time scales, neutral nucleation rates can be measured with 
zero background from ion-induced nucleation. For GCR and beam conditions, the 
electric field was set to zero, leading to ion pair concentrations of about 650 cm > 
for Jgcr, representative of the boundary layer, and about 3,000 cm ° for Jz, repres- 
entative of the top of the troposphere. Both J... and J;, comprise the sum of neutral 
and ion-induced nucleation rates, at their respective ion concentrations, whereas 
J Measures the neutral rate alone. 

The nucleation rates are obtained from the formation rates, dN,,, /dt (where the 
subscript dj, refers to the detection threshold diameter of the particle counter). The 
nucleation rates, J, 7, are determined at 1.7nm mobility diameter (1.4nm mass 
diameter) after correcting for losses between 1.7 nm and the detection size thresh- 
old**’. A diameter of 1.7 nm corresponds to a cluster considered to be above the 
critical size and therefore thermodynamically stable. The critical size, which corre- 
sponds to equal evaporation and growth rates of the cluster, varies with temperature, 
chemical species and concentrations, and may even be absent when evaporation 
rates are highly suppressed as in the case of sulphuric acid-DMA clusters. Because 
the loss rate of freshly nucleated particles to the chamber walls is comparable to the 
rate at which they are lost in the atmosphere to pre-existing aerosols under pristine 
boundary-layer conditions, the reported formation rates at 1.7 nm size should corre- 
spond reasonably well to atmospheric observations of new particle formation. 

Before J,.7 is calculated, the measured particle number concentrations versus 
time are corrected**”° in two sequential steps for the loss of particles due to the 
chamber walls, dilution and coagulation: first, particle losses above d,,, and 
second, particle losses during growth from 1.7nm to dy. The wall loss rate is 
1.7X 107s | for H,SO, monomers and decreases with increasing cluster dia- 
meter as 1/d. The dilution lifetime is 3-5 h, depending on the total sampling rate 
of all instruments attached to the chamber. Correction 2 above requires know- 
ledge of the particle growth rate. This is determined experimentally from the 
different rise times measured in a scanning PSM, which detects particles over 
a range of threshold diameters between 1 and 2.5nm. The growth rates were 
verified with several other instruments, including a fixed-threshold PSM, two 
DEG-CPCs, a TSI 3776, two APi-TOFs, a NAIS and a SMPS. Because instru- 
mentally determined growth rates were not available for all runs, a parameteriza- 
tion was derived to allow the growth rate to be calculated for every run. 

The detection thresholds of the particle counters do not represent perfect step 
functions, so particles with smaller diameters are detected to some extent. This 


leads to over-counting, which becomes a more important—and more uncertain— 
correction as the CPC threshold approaches 1.7 nm. For this reason, the nuc- 
leation rates reported here are based on a TSI 3776 CPC with d,, = 3.2nm 
since—although requiring the largest corrections for losses between 1.7 nm and 
the detection threshold—it has negligible sensitivity to clusters below 1.7 nm. To 
confirm the nucleation rates obtained with the 3.2nm CPC, they were derived 
independently from the other CPCs with lower detection thresholds and verified 
to agree within systematic uncertainties. 

The ion-induced nucleation rate, Jiin, for positive and negative particles is measured 
with the NAIS. This provides the most sensitive determination of the ion-induced 
fraction, Jiin/Jtotals because the NAIS measures only charged clusters. Loss corrections 
are applied to the charged cluster spectra to account for wall losses, dilution and ion- 
ion recombination. In addition a source correction is applied to account for diffusion 
charging of neutral clusters by small ions. The latter correction requires knowledge of 
the number concentrations of small ions and of neutral clusters versus particle 
diameter. The neutral cluster concentrations are measured with the 3.2nm TSI 
3776 CPC and their size spectra are measured with the SMPS. The small-ion con- 
centrations are measured with the AIS* and AIS”. The charging (collision) prob- 
abilities are determined using the collision kernels versus diameter from ref. 41. 

The error on J, 7 has three components that are added together in quadrature to 
estimate the total error indicated in Figs 1 and 2a. The first is a statistical mea- 
surement error derived from the scatter of the particle counter measurements, 
evaluated separately for each nucleation event; the second is an estimated +50%/ 
—33% uncertainty on the calculated correction factor, J, 7/Ja,,, where Jy, is the 
nucleation rate at size dy, obtained after correcting dN4,, /dt for detection losses. 
The third is a +30% systematic uncertainty on Jq,, estimated from the run-to-run 
reproducibility of dNg,, /dt under nominally identical chamber conditions. 

The error on Jj, has two main components. The first is a statistical measure- 
ment error derived from the scatter of the NAIS measurements, evaluated sepa- 
rately for each nucleation event. The second is an estimated +50% error to account 
for the uncertainty in the correction for diffusion charging of neutral clusters by 
small ions. These errors are added together in quadrature with the error on J, 7 to 
estimate the error on the ion-induced fraction, Jiin/Ji.7, shown in Fig. 2b. 

The overall experimental uncertainty on [H2SO,] measured by the CIMS is 
estimated to be + 100%/—50%, on the basis of three independent measurements: 
particle growth rate under binary nucleation conditions, the depletion rate of 
[SO2] by photo-oxidation, and an external calibration source’. However, the 
run-to-run relative experimental uncertainty on [H2SO,] is smaller, typically 
+10%. In deriving the sulphuric acid dimer concentrations measured by the 
CIMS (Fig. 4), we assumed the same charging efficiency by the ion source as for 
monomers. The concentrations of SO; and O3 are measured with calibrated 
instruments and are known to + 10%. The overall uncertainty on the NH, mixing 
ratio is estimated to be +100%/—50%. The point-to-point uncertainty on the 
DMA mixing ratio is estimated to be +(11% + 12%/[DMA] (p.p.t.v.)), with an 
overall scale uncertainty of +50%/—33%. The minimum directly measurable 
values are 2 p.p.t.v. for NH; and 0.2 p.p.t.v. for DMA. However, lower values 
can be estimated from precise calibration of the trace gas delivery systems together 
with molecular analysis of the nucleating clusters in the APi-TOFs. 

To compare the CLOUD measurements with theoretical expectations, all pos- 
sible collision, coagulation, evaporation and fragmentation reactions have been 
explicitly simulated for a certain set of clusters. Collision and coagulation rates 
are computed from kinetic gas theory, while evaporation and fragmentation rates 
are obtained from quantum chemistry”. Dynamical simulations were performed 
with the ACDC model” to calculate the formation of neutral, positively charged 
and negatively charged clusters containing sulphuric acid, ammonia and DMA. The 
electrostatic enhancement of ion-molecule collisions is calculated using dipole 
moments and polarizabilities obtained from quantum chemistry. The model has 
no fitted parameters. As a result of computing limitations, the formation and 
evaporation of clusters containing up to four sulphuric acid and four base molecules 
(mobility diameters 1.2-1.4nm) have been modelled so far. The diameters of the 
largest computed clusters are smaller than the 1.7 nm size at which the experimental 
formation rates (J,.7) are determined. The modelled formation rates can therefore 
be expected to overestimate the CLOUD measurements somewhat. Further 
description of the ACDC model is provided in Supplementary Information. 
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Extended Data Figure 1 | Schematic diagram of the CLOUD experiment at the CERN Proton Synchrotron. 
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Theory (ACDC): 

— J, (0.1 pptv DMA + 0 pptv NH3) 
— J, (10 pptv DMA + 0 pptv NH;) 

0 Ger Ratio JnpH/ Sn RH=38% (0.1 pptv DMA) 
= Ratio JnpH/ Jn RH=38% (1 0) pptv DMA) 
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Extended Data Figure 2 | Theoretical dependence of amine ternary DMA (purple curve) or 10 p.p.t.v. DMA (red curve). The nucleation rates 
nucleation rates on RH. Modelled neutral nucleation rates asa function ofRH __ relative to their value at 38% RH are shown on the right-hand scale (dashed 
(left-hand scale) at 2.0 X 10°cm~* [H,SO,] and 278K, and either 0.1 p.p.tv. purple and red curves). 
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Theory (ACDC): 

— Jeger (2-0x10° cm” [NH3]) 

— Jc (2-0x10° cm™® [DMA]) 

= = Ratio Jgcer7/Jgcr-t=278x (2.0x10° cm® [DMA]) 
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Extended Data Figure 3 | Theoretical dependence of ammonia ternary and _ between 7.0 p.p.t.v. at 255 K and 8.2 p.p.t-v. at 300 K.) The sulphuric acid-DMA 
amine ternary nucleation rates on temperature. Modelled GCR nucleation _ nucleation rate relative to the value at T = 278 K is shown on the right-hand 
rates as a function of temperature (left-hand scale) at 2.0 x 10°cm~* [H,SO,] scale (dashed red line). In the sulphuric acid-DMA system a sticking 

and either 2.0 X 10®cm~° [NHs] (blue curve) or 2.0 X 10®cm~* [DMA] (red probability of 0.5 is assumed for all neutral-neutral collisions, and 1.0 for all 
curve). (A concentration of 2.0 X 10° cm ° is equivalent to mixing ratios charged-neutral collisions. 
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Extended Data Figure 4 | Theoretical concentrations of negative, positive 
and neutral clusters during DMA ternary nucleation. Modelled steady-state 
concentrations (mDMA versus nSA) at 4.0 X 10°cm~* [H2SO,], 10 p.p.tv. 
DMA, 4 ion pairs cm °s !and278K.a, Negative clusters. b, Positive clusters. 
c, Neutral clusters. A sticking probability of 0.5 is assumed for all 
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neutral—neutral collisions and 1.0 for all charged-neutral collisions. The 
numbers below the centre of each circle show logigC, where C (cm *) is the 
cluster concentration (the threshold is 0.01 cm °). The circle areas within each 
panel are proportional to C (with the exception of the DMA monomer in c). 
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Chelicerate neural ground pattern in a Cambrian 
great appendage arthropod 


Gengo Tanaka’, Xianguang Hou’, Xiaoya Ma”, Gregory D. Edgecombe? & Nicholas J. Strausfeld* 


Preservation of neural tissue in early Cambrian arthropods has 
recently been demonstrated’, to a degree that segmental structures 
of the head can be associated with individual brain neuromeres. This 
association provides novel data for addressing long-standing contro- 
versies about the segmental identities of specialized head appendages 
in fossil taxa”*. Here we document neuroanatomy in the head and 
trunk ofa ‘great appendage’ arthropod, Alalcomenaeus sp., from the 
Chengjiang biota, southwest China, providing the most complete 
neuroanatomical profile known from a Cambrian animal. Micro- 
computed tomography reveals a configuration of one optic neuropil 
separate from a protocerebrum contiguous with four head ganglia, 
succeeded by eight contiguous ganglia in an eleven-segment trunk. 
Arrangements of optic neuropils, the brain and ganglia correspond 
most closely to the nervous system of Chelicerata of all extant arth- 
ropods, supporting the assignment of ‘great appendage’ arthropods 
to the chelicerate total group**. The position of the deutocerebral 
neuromere aligns with the insertion of the great appendage, indi- 
cating its deutocerebral innervation and corroborating a homology 
between the ‘great appendage’ and chelicera indicated by morpho- 
logical similarities**’. Alalcomenaeus and Fuxianhuia protensa' demon- 
strate that the two main configurations of the brain observed in modern 
arthropods, those of Chelicerata and Mandibulata, respectively*, had 
evolved by the early Cambrian. 

Cambrian ‘great appendage’ arthropods, collectively known as 
Megacheira’, are variously regarded as mono-, para- or polyphyletic 


and variously interpreted as stem-group chelicerates*” or as stem-group 
arthropods*"®. They are characterized by raptorial cephalic appendages 
with an ‘elbow joint’ between a proximal pedunculate part and a distal 
part bearing an elongate spine on each article’. Morphologically and 
taxonomically best understood are Leanchoiliidae, which have three 
long spine-bearing or spiniform articles on the great appendage, each 
with a flagellate distal part't’’. 

YKLP 11075 is a leanchoiliid from the early Cambrian Chengjiang 
biota (Ywanshan Member, Heilinpu Formation), Yunnan Province, 
southwest China’*. It is preserved as part and counterpart in dorso- 
ventral aspect, exhibiting the cephalic shield and 11 complete trunk 
segments (Figs 1a and 2a). It is assigned to Alalcomenaeus (Fig. 3a, b) 
rather than the closely allied Leanchoilia (Fig. 3c), abundantly repre- 
sented in the Chengjiang biota by Leanchoilia illecebrosa. Distinction 
from Leanchoilia is based on the straight (rather than pointed) anterior 
margin of the cephalic shield and rounded, paddle-shaped (rather than 
lanceolate) telson’* (Fig. 3a). Studied specimens are similar to Alalcome- 
naeus cambricus from the Burgess Shale, Canada, having four rather 
than three pairs of cephalic appendages, as described for A. cambricus’’. 
We describe them in open nomenclature, that is, Alalcomenaeus sp., 
because most of the Alalcomenaeus material from the Chengjiang biota 
has not been studied in detail. 

Paired eyes, well separated by exoskeleton, are preserved each side 
of the midline at the anterior margin of the cephalic shield (Fig. 2a). 
Each eye is approximately 0.75 mm wide, composed of approximately 


Figure 1 | Alalcomenaeus sp. from the Chengjiang Lagerstiitte. a, Incident 
light photograph, dorsal view of montaged part and counterpart YKLP 11075. 
b, Energy-dispersive X-ray fluorescence (EDXRF) Fe profile. Double-headed 
arrow indicates rostro-caudal extent of oesophageal foramen. c, MicroCT scan. 
d, Overlay of EDXRF Fe and microCT. e, Inverted white coincidence signal after 
isolated magenta (b) and green (c) removal. Arrowheads mark posterior margin of 


cephalic shield. f, Cephalon with superimposed EDXRF Cu (blue) and EDXRF Fe 
(red) profiles. g, Cephalon with superimposed EDXRF Cu (blue) and CT (green) 
profiles. C1, tritocerebrum; C2, C3, cephalic biramous segment neuropils; CA, 
cephalic segments; GA, great appendage neuropil = deutocerebrum (dew); Oe, 
oesophageal foramen; on1, first-order visual neuropil; pr, protocerebrum; T1-T8, 
trunk segment neuropils; T9-11, trunk segments lacking ganglia. Scale bar, 2 mm. 
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Figure 2 | Details of eye pairs and visual neuropils in Alalcomenaeus sp. 
YKLP 11075. a, Cephalic region; boxed areas refer to panels b-d. b, Left cornea 
of left eye pair showing lenses (arrowed) surmounting pigmented area. 

c, Lens rows (arrowed: left eye of right pair). d, Enlargement of left eye (ey) pair 
showing trace of retinula axon bundle (r) extending to rust-coloured first-order 
optic neuropil (on1) separate from the brain, but connected to it by an optic nerve 
(op n) terminating in a similarly coloured domain (on2) integrated in the 
protocerebrum. e, Superimposition of EDXRF Fe (red) showing strict 
coincidence of detected iron at the first-order optic neuropil (on1) and underlying 
protocerebral areas. Outlines of on1 and on2 are superimposed. Scale bar, 2 mm. 


15 um diameter facets (Fig. 2b, c) spaced as 20-25 rows. Each comprises 
8-10 vertically arranged elements, equipping each eye with 160-250 
facets. Facets surmount a dark zone contiguous with centrally extend- 
ing traces of the retinula nerve from each eye (Fig. 2d). Nerves from a 
lateral pair of eyes converge to an obcordate rust-brown area iden- 
tified as the first visual neuropil (Fig. 2d, left). A single optic nerve 
connects it to a similarly rust-brown strip interpreted as the second 
optic neuropil integrated in the rostral volume of the protocerebrum 
(Fig. 2d). The extent of these neuropils is resolved by superimposition 
of iron distribution detected by energy-dispersive X-ray fluorescence 
(EDXREF Fe) over the relevant region (Figs 1fand Fig. 2e, and Extended 
Data Fig. 1). 
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Outlines of contiguous cerebral and thoracic neuromeres were resolved 
by combining EDXRF Fe with computed tomography (CT). The EDXRF 
copper (blue) signal (Extended Data Fig. 1) corresponds to much of the 
internal volume of the specimen. Overlaying the EDXRF Fe (magenta) 
signal demonstrates a highly constrained distribution of iron to the puta- 
tive nervous system (Fig. 1f). Overlaying the CT signal (Fig. 1c, green) and 
EDXRF Fe (Fig. 1b, magenta) signal in Fig. 1d, then subtracting isolated 
greens and magentas and inverting the white coincidence signal, resolves 
highly constrained segmental neuromeres (Fig. le). In the head, EDXRF 
Fe and CT overlie the protocerebrum and first-order visual neuropil 
(Fig. 1f, g), which receives its input from the two eyes (Fig. 2a, d, e), 
followed by an elongated neuromere corresponding to the deutocere- 
bral origin of the great appendage (Fig. le-g, GA; Extended Data Fig. 1). 
Two smaller swellings indicate fused neuromeres corresponding to 
the segmental origins of the first and second pairs of biramous ceph- 
alic appendages (C1, C2) (Fig. le). These three neuromeres (GA, Cl, 
C2) flank the extended oesophageal foramen reaching rostrally to the 
caudal margin of the protocerebrum (Fig. le). Neuropils lateral to the 
foramen converge between the second (C2) and third post-GA neuro- 
mere (C3), which is at the midline (Fig. le). Each neuromere aligns 
with the segment of the head defined by the attachment points of the 
corresponding member of the first four appendage pairs. All are resolved 
anterior to the posterior margin of the head shield (open arrowheads 
in Figs le and 3a-c). 

In contrast to discrete ganglia linked by elongated connectives’, the 
robust post-cephalic neuromeres T1-T8 are effectively contiguous, with- 
out intervening connectives. Segments 9-11 show no detectable signal 
indicative of ganglia. These most caudal segments are equipped with 
successively smaller appendages until the broad, ovoid telson (Fig. 3a), 
which is attached to the eleventh segment. Telson flexion, such as would 
elicit reflex escape behaviours, requires power from longitudinal muscles 
within the last few trunk segments. These were probably specialized in 
segments 8-11, driven by motor neurons in ganglion T8, a similar 
arrangement existing in the extant crustacean order Mystacocarida 
in which the last four trunk segments lack ganglia but are innervated 
from post-cephalic segment T8 (ref. 16). 

Reconstruction of neuromeric topology shows critical correspon- 
dences with the nervous systems of chelicerates, exemplified by the horse- 
shoe crab Limulus polyphemus and scorpion Centruroides sculpturatus 


Lagerstiatte. Lateral views. Arrowheads mark posterior margin of cephalic 
shield. a, b, Alalcomenaeus sp. a, YKLP 11076. b, YKLP 11077. ¢, Leanchoilia 


illecebrosa. YKLP 11078. C1-C3, biramous cephalic appendages 1-3; ey, eye; 
GA, great appendage; T1-T3, biramous limbs of trunk segments 1-3. Scale 
bars, 2mm. 
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(Fig. 4). In all three taxa a single optic neuropil, which is separate from 
the brain, serves each eye (Fig. 2a, d), and all taxa have an extended 
oesophageal foramen flanked by the fused proto-, deuto- and tritocere- 
bral neuromeres (Fig. le). As in Limulus, the fourth rostral neuromere 
also participates in this flanking arrangement. Although distinct enti- 
ties, the ganglia in the trunk of Alalcomenaeus show partial fusion, as 
in Limulus. 

The great appendage of Megacheira has variously been regarded as 
proto-’, deuto-* or tritocerebral*. A deutocerebral identity is consistent 
with a structural homology between the great appendage and chelicera 
with respect to the elbow joint and arrangement of fixed and movable 
fingers*’. Chelicerae are demonstrably deutocerebral based on Hox gene 
expression domains’’. The arrangement of neural structures in Alalco- 
menaeus favours a deutocerebral innervation of the great appendage. 
The position of its ventral point of origin corresponds to that of the 
second largest neuromere, immediately posterior to the protocerebrum, 
itself defined by its connection with the separate optic neuropil. Although 
the great appendage is incomplete in YKLP 11075, its attachment point in 
other specimens of Alalcomenaeus (Fig. 3b) and in Leanchoilia (Fig. 3c) 
best corresponds to the position of the prominent deutocerebral neuro- 
mere in YKLP 11075. 

The distribution of neuromeres in the head of YKLP 11075 conforms 
to the presence of three pairs of biramous limbs posterior to the great 
appendage, although only two pairs have been described from the Bur- 
gess Shale Alalcomenaeus cambricus'’. Two pairs had likewise been 
documented in Leanchoilia superlata’’, but a small first pair was subse- 
quently identified in that species'*. The presence of three pairs in the 
present Chengjiang Alalcomenaeus (Fig. 3a, b) suggests that this is a 
diagnostic feature of leanchoiliids generally, and brings this group into 
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Figure 4 | Nervous systems of Chelicerata. a—c, Reconstruction of ‘great 
appendage’ arthropod and Chelicerata nervous systems. a, Alalcomenaeus sp. 
b, Larval Limulus polyphemus (Xiphosura). c, Centruroides sculpturatus 
(Scorpiones). d-f, Enlargements of corresponding segmental neuromeres and 
optic neuropils (shown in navy blue). Each eye supplies its first optic neuropil 
(on1) outside the protocerebral mass; second order optic neuropils (on2) are 
integrated within the protocerebrum (pr). The oesophageal foramen (Oe) 
reaches the caudal margin of the protocerebrum; second and third neuromeres 
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line with the ground pattern of crown-group euarthropods as well as 
trilobites, which possess paired uniramous antennae and three bira- 
mous post-antennal head segments”. 

Descriptions of the Leanchoilia midgut show a uniformly broad, conti- 
guous and segmentally convoluted organ system of associated glands 
distinct from segmental ganglia’, originating at the level of C3 and 
terminating at trunk segment T8 (ref. 20), whereas a more tubular gut 
tract is seen in Alalcomenaeus cambricus’*. A gut identity can be ruled 
out for the nerve cord in Alalcomenaeus because of its continuity with 
the brain and the obvious intersegmental constrictions between seg- 
mental ganglia (Fig. le). In Alalcomenaeus the oesophageal foramen 
reaches forward to the caudal margin of the protocerebral neuromere 
(Figs le and 2a), demonstrating that in this taxon only the protocere- 
brum is supraoesophageal, a condition typifying Limulus and Centru- 
roides and suggested as ancestral for chelicerates*. Chelicerate brains 
differ from those of mandibulates in that only the first-order visual 
neuropil of each eye lies separate from the brain, connected by relays to 
second- and third-order visual centres integrated within the protocere- 
brum. This organization, shared by pycnogonids”, xiphosurans” and 
arachnids”’, contrasts with mandibulates, in which three (Malacostraca, 
Insecta) or two (Scutigeromorpha) nested optic neuropils reside sepa- 
rate from, but connected to, the mid-brain proper’’. Furthermore, 
other than in pycnogonids, all extant chelicerates, including Limulus, 
show fusion of cerebral and trunk ganglia. Alalcomenaeus shares these 
crucial elements of the chelicerate central nervous system ground pattern. 
Phylogenetic analysis of a comprehensive data set of neural characters”* 
(Supplementary Tables 1 and 2) resolves fusion of cerebral and trunk 
ganglia, and the single optic neuropil outside the protocerebrum, as 
apomorphies of Chelicerata sl. (sensu lato; Pycnogonida and Euchelicerata), 


flank the foramen. In Centruroides, the protocerebrum is recurved over the 
cheliceral (ch) and pedipalp (pp) neuromeres (homologues of Alalcomenaeus 
great appendage neuromere (GA) and first cephalic appendage neuromere 
(C1), and Limulus cheliceral and pedipalp neuromeres). C1-C3, first-third 
cephalic neuromeres; deu, deutocerebrum; L1-L4, first to fourth leg 

ganglia; op1, op2, first and second opisthosomal neuromeres; T1-T8, trunk 
ganglia 1-8; tri, tritocerebrum. Eyes shown in brown, faceted eyes indicated 
by radial divisions. 
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including Alalcomenaeus. The only arthropod outside the chelicerates 
known to possess a nervous system condensed into a single mass is the 
hemipteran water strider (Gerridae), but as in other mandibulates the 
three nested optic lobes extend laterally from the protocerebrum’. 
Segmentation of the postoral nervous system in Alalcomenaeus is a 
shared derived character of Euarthropoda rather than an indicator of 
affinity to a particular euarthropod clade. 

Mandibulate arthropods possess just one pair of compound eyes, 
except where the upper and ventral halves are specialized to serve diffe- 
rent perceptual functions, as in Bibionidae and Gyrinidae**”*. Multiple 
pairs of eyes typifies chelicerates. Possession of paired eyes disposed late- 
rally and relaying to their optic neuropils fits the cladogram (Extended 
Data Fig. 2) as a derived character of Chelicerata s.]. including Alalco- 
menaeus. We infer this character to also apply to Leanchoilia where 
four visual units (as in Alalcomenaeus sp.) have been unambiguously 
described for L. superlata and L. persephone'*'*. Structures in Lean- 
choilia interpreted as paired pendulous eyes ‘structured like a bunch of 
grapes” are likely cuticular features of the great appendages’. 

The utility of computed synchrotron and phase-contrast radiation 
X-ray tomography for resolving soft tissue preservation has precedents 
in studies of ancient soft tissue from Cambrian Orsten trilobites** and 
crustaceans”, including data on the gut, connective tissue and possible 
segmental muscle. For this account, computed tomography was achieved 
ona specimen that, unlike most Chengjiang specimens, showed conside- 
rable preservation in depth. Resolving nervous tissue resulted from scan- 
ning conjoined part and counterpart and subtracting contaminants of 
other tissue fragments (see Methods). Such combinatorial approaches 
should yield comparable internal details for other Burgess-Shale-type 
fossils that show appropriate dimensionality. 


METHODS SUMMARY 


Part and counterpart of the specimen (YKLP 11075) were precisely aligned for X-ray 
computed tomography (see Methods, Supplementary Information). Elemental distri- 
bution analyses were obtained using energy-dispersive X-ray fluorescence (EDXRF) 
microscopy. Light microscopy images were processed with Adobe Photoshop 
Elements 10 (Adobe Systems) using enhance functions for colour correction, 
balance, overlays, subtractions and inversions (see Methods, Supplementary Infor- 
mation and Figs 1d-g, 2e and Extended Data, Fig. 1). 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Matrix obscuring parts of the fossil was removed with a fine needle under a bino- 
cular microscope. Part and counterpart of the specimen (YKLP 11075; stored at 
the Yunnan Key Laboratory for Palaeobiology, Yunnan University) were precisely 
aligned, wrapped in cotton wool for stability and placed into a film case. For X-ray 
computed tomography (CT) the encased specimen was scanned using a ScanXmate- 
Lsystem (Comscantecno) at 151 kV with a resolution of 28.8 jim (Fig. 1c) or 14.4 um 
(Fig. 1g and Extended Data Fig. 1). Molcer Plus software (White Rabbit) was used 
to convert the two-dimensional CT image stacks into a three-dimensional (3D) 
image. For Fig. 1c, volume rendering was performed on a 1332 TIFF image stack 
(28.8 jum stack intervals). For Fig. 1g and Extended Data Fig. 1, volume rendering 
was performed ona 1264 TIFF image stack (14.4 jm stack intervals). Contrast and 
brightness of the 3D images were processed with Molcer Plus software. Elemental 
distribution analyses were obtained using energy-dispersive X-ray fluorescence 


(EDXRF) microscopy. HORIBA XGT-7000V (HORIBA) at 50 kV accelerated voltage 
and 1 mA probe current, using mono-capillary primary optics to focus the X-ray 
beam to a diameter of 100 jum. YKLP 11075 was attached to an aluminium stage 
using conductive carbon tape, its position in the vacuum chamber adjusted using a 
motorised xyz platform, and viewed using three integrated colour video cameras. 
An area of 25.6 mm X 7.2 mm was analysed under full vacuum using 50 tm steps 
and 200 X 74 frames to provide two-dimensional distribution maps of Fe and Cu 
recorded as high-resolution TIFF images. Light microscopy photographs were 
taken with a Leica DFC 500 digital camera attached to a Leica M205C microscope. 
Images were processed with Adobe Photoshop Elements 10 (Adobe Systems) using 
enhance functions for colour correction, balance, overlays, subtractions and inver- 
sions (Figs 1d-g, 2e and Extended Data Fig. 1). 

Each reconstruction in Fig. 4 has been scaled up or down to aid comparison. 
Figure 4b is based on published immunocytological data’'. Figure 4c is adapted 
from histological observations of scorpion central nervous system’. 
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Dain ask 
Extended Data Figure 1 | Cephalic region of Alalcomenaeus sp. YKLP CT and EDXRF Cu; superimposition of EDXRF Fe and EDXRF Cu; 
11075 All in dorsal view, composites of part and counterpart (upper left). superimposition of all scans. C1, first post-GA neuropil = tritocerebrum (tri); 
Second left to right: CT scan (green); EDXRF Fe (red); superimposition of CT C2, second post-GA neuropil; GA, great appendage neuropil = deutocerebrum 
and EDXRF Fe. Lower row, left to right: EDXRF Cu (blue); superimposition of — (deu); on], first optic neuropil; pr, protocerebrum. 
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Euperipatoides rowelli 
Onychophora Lycythorhyncus sp. 
Limulus polyphemus 
Hadrurus arizonensis 
Eremobates pallipes 


Mastigoproctus giganteus 


Chelicerata Chelicerata s. 1. 


Phrynus marginemaculata 
Heptathela kimurai 
Cupiennius salei 
Alalcomenaeus sp. 


Euarthropoda Orthoporous ornatus 


Scolopendra polymorpha Myriapoda 


| “Crustacea” 


Scutigera coleoptrata 
Tigriopus californicus 


Mandibulata 


Godzilliognomus frondosus 
Triops longicaudatus 
Artemia salina 

Nebelia pugettensis 

Ligia occidentalis 
Pseudosquilla ciliata 
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Astacus fluviatalis 
Hemigrapsus oregonensis 
Ceonobita clypeatus 
Machilis germanicus 
Lepisma saccharina 
Periplaneta americana 
Mantis religiosa 

Apis mellifera 

Polistes flavus 
Dasymutilla sp. 
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Schistocerca americana 
Potamanthus luteus Hexapoda 
Libellula saturata 
Aquarius remigis 
Thermonectus marmoratus 
Dineutus sublineatus 
Chauliognathus lecontei 
Anisolabis maritima 
Macroglossum stellatarum 


Drosophila melanogaster 


Extended Data Figure 2 | Arthropod relationships based on neuroanatomical characters. Strict consensus of 34 shortest cladograms based on 145 characters in 
Supplementary Information Table 2. 
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A juvenile mouse pheromone inhibits sexual 
behaviour through the vomeronasal system 


David M. Ferrero’, Lisa M. Moeller?, Takuya Osakada’, Nao Horio’, Qian Lil, Dheeraj S. Roy', Annika Cichy’, Marc Spehr?, 


Kazushige Touhara** & Stephen D. Liberles! 


Animals display a repertoire of different social behaviours. 
Appropriate behavioural responses depend on sensory input 
received during social interactions. In mice, social behaviour is 
driven by pheromones, chemical signals that encode information 
related to age, sex and physiological state’. However, although mice 
show different social behaviours towards adults, juveniles and neo- 
nates, sensory cues that enable specific recognition of juvenile mice 
are unknown. Here we describe a juvenile pheromone produced by 
young mice before puberty, termed exocrine-gland secreting peptide 
22 (ESP22). ESP22 is secreted from the lacrimal gland and released 
into tears of 2- to 3-week-old mice. Upon detection, ESP22 activates 
high-affinity sensory neurons in the vomeronasal organ, and down- 
stream limbic neurons in the medial amygdala. Recombinant ESP22, 
painted on mice, exerts a powerful inhibitory effect on adult male 
mating behaviour, which is abolished in knockout mice lacking 
TRPC2, a key signalling component of the vomeronasal organ”’. 
Furthermore, knockout of TRPC2 or loss of ESP22 production re- 
sults in increased sexual behaviour of adult males towards juveniles, 
and sexual responses towards ESP22-deficient juveniles are sup- 
pressed by ESP22 painting. Thus, we describe a pheromone of sexually 
immature mice that controls an innate social behaviour, a response 
pathway through the accessory olfactory system and a new role for 
vomeronasal organ signalling in inhibiting sexual behaviour towards 
young. These findings provide a molecular framework for under- 
standing how a sensory system can regulate behaviour. 

We developed a genome-based strategy for identifying additional 
mouse pheromones (Fig. 1a). Chemicals that function as pheromones 
include urinary volatiles, steroid derivatives and proteins secreted into 
bodily fluids such as urine, tears and saliva*’. Several protein phero- 
mones are encoded by large, rapidly evolving gene families, but most 
pheromone homologues encoded by the mouse genome are of un- 
known function*°. We constructed quantitative PCR (qPCR) primers 
to detect expression of protein pheromones and their homologues, 
including exocrine gland-secreting peptides (ESPs), androgen-binding 
proteins (ABPs), major urinary proteins and other lipocalins. 
Expression levels were quantified in complementary DNA (cDNA) 
derived from various pheromone-producing tissues obtained from 
mice of different sexes, ages and physiological states. 

Using this strategy, we identified several peptides with striking age- 
dependent production in the extraorbital lacrimal gland, including 
ESP22 produced by juveniles, ESP15 and ESP16 produced by adults 
of both sexes and ABP27 produced by neonates (Fig. 1b and Extended 
Data Fig. 1a). We also identified male-enriched peptides of unknown 
function, including ESP24 and various ABPs. Interestingly, sexually 
dimorphic production of ESP24 and the male pheromone ESP1 was 
similar (~500-fold male-enriched), but occurred in different mouse 
strains (Extended Data Fig. 2b). 

Because juvenile pheromones are unknown, we performed addi- 
tional studies of ESP22. ESP22 was maximally expressed in lacrimal 


gland between 2 and 3 weeks of age, and decreased sharply after 
4 weeks of age, near puberty (Fig. 1b). Quantitative analysis indicated 
ESP22 expression in lacrimal gland to be similar in male and female 
juveniles, and approximately 50-fold higher in juveniles than adults 
(Extended Data Fig. 2d). ESP22 expression was not detected in cDNA 
derived from 16 other mouse tissues, including other exocrine glands, 
internal organs and sensory epithelia (Fig. 1c and Extended Data Fig. 
2f). In contrast, ABP27 expression was detected in adult salivary gland 
as well as neonatal lacrimal gland (Extended Data Fig. 2e). 

Next, we identified lacrimal gland cell types that expressed ESP22 and 
other pheromone homologues using RNA in situ hybridization. We 
found that ESP22 is produced by a subset of lacrimal secretory cells, 
termed acinar cells (Extended Data Fig. 1c), which release contents into 
tears, a source of mouse pheromones’. ESP22 expression was detected in 
juvenile but not adult acinar cells, whereas ESP24 expression was detect- 
ed only in adult male acinar cells (Fig. 1d). Furthermore, Esp22 was 
not expressed in castrated and ovariectomized adults, suggesting sex- 
hormone-independent Esp22 gene regulation (Extended Data Fig. 1b). 

To test whether ESP22 protein was secreted into tears by acinar cells, 
we generated and affinity-purified a polyclonal anti-ESP22 antibody. 
Western blot analysis using this antibody identified a protein of 
expected mass (approximately 10 kDa) that was enriched in juvenile 
tears (Fig. le). Concentrations of this protein (3-5 ng pl * in juvenile 
tears, or 300-500nM) were determined using a standard curve of 
recombinant ESP22 (Extended Data Fig. 3). Mass spectrometry iden- 
tified ESP22-derived tryptic peptides in tears of juveniles but 
not adults, indicating greater than 100-fold enrichment (Fig. 1f), 
and showed the primary structure of mature ESP22 (amino acids 
23-111, Extended Data Fig. 4). Together, these findings indicate that 
ESP22 is a lacrimal peptide secreted into tears of juvenile mice. 

Next, we asked whether ESP22 was detected by the mouse olfactory 
system. Other protein pheromones, including ESP1, activate basal 
vomeronasal organ (VNO) sensory neurons'’, so we examined elec- 
trophysiological responses to ESP22 in the VNO. Recombinant ESP22 
was prepared as a fusion protein with maltose binding protein (MBP), 
which enhanced solubility’*. Electrovomeronasogram (EVG) record- 
ings indicated that recombinant ESP22 (200 nM) evoked a negative 
field potential in the VNO (Fig. 2a), with a sensitivity matching ESP1 
responses previously reported with this technique'*’*. MBP was not 
similarly detected, although small EVG responses to MBP were ob- 
served at higher concentrations (data not shown). High-affinity res- 
ponses to ESP22 in the VNO required the ion channel TRPC2 (Fig. 2a), 
and were not observed in electroolfactogram (EOG) recordings of the main 
olfactory epithelium (Fig. 2b), which is also important for pheromone- 
driven social behaviours'*””. 

Next, we used extracellular loose-seal recordings to examine ESP22 
responses in individual VNO sensory neurons. ESP22 evoked robust 
and repetitive discharge patterns in 1.3% of basal VNO sensory neu- 
rons (5/383), consistent with detection by one or a few VNO receptors 
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Figure 1 | ESP22 is secreted into juvenile tear fluid. a, Strategy to identify 
mouse pheromones. Abp-/}27 is also known as Scgb2b27; Abp-f2 is also known 
as Scgb2b2. b, Age-dependent gene expression in lacrimal gland (LG) 
determined by qPCR (n = 4-12, mean + s.e.m.). ¢, Esp22 expression in juvenile 
and adult tissues determined by PCR with reverse transcription (RT-PCR); 
olfactory epithelium (OE), olfactory bulb (OB), harderian gland (HG), 
submaxillary gland (SMG), parotid gland (PG), sublingual gland (SLG). 

d, Age- and sex-dependent Esp expression in lacrimal gland determined by 

in situ hybridization. Scale bar, 100 jum. e, Western blot analysis of tears using 
anti-ESP22 antibody. f, Mass spectrometry analysis of an ESP22-derived tryptic 
peptide (GIVFNTIK) from tears. 


(Fig. 2c, Extended Data Fig. 5 for higher [ESP22]). Threshold ESP22 
responses observed by single unit extracellular recordings (Fig. 2d) 
occurred at similar concentrations (20 pM) to threshold ESP1 responses 
previously measured using genetically encoded calcium indicators"®. 
Most neurons responsive to ESP22 were activated by juvenile tears but 
not by MBP or adult tears (6/11, Fig. 2e and Extended Data Fig. 5), with 
neuron viability verified by K*-mediated depolarization. High-affinity 
ESP22 responses were also recorded in 1-2% of VNO sensory neurons 
using current-clamp recording techniques and single-neuron calcium 
imaging (data not shown). 
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Figure 2 ESP22 activates the vomeronasal system. a, b, EVG (a) and EOG 
(b) recordings in Trpc2 ’~ and Trpc2*’* mice exposed to ESP22 (200 nM) and 
MBP (450 nM) c, The percentage of basal VNO sensory neurons (n = 383) 
responsive to ESP22 (20 pM) and KCl (50 mM) determined by single-unit 
extracellular loose-seal recordings. d, e, Responses of single VNO sensory 
neurons. f, g, Visualization and quantification of cFos-expressing neurons in 
MeA, determined by immunohistochemistry in coronal brain sections from 
ESP22- and MBP-exposed male mice (mean + s.e.m., n = 3, *P< 0.05, 

** P< ),01, Student’s one-tailed t-test). 


We next identified limbic neurons activated by ESP22 exposure 
using immunohistochemistry for the neural activity marker cFos in 
cryosections of adult male mouse brains. ESP22 and juvenile tears 
(Fig. 2f and Extended Data Fig. 6) induced cFos expression in the 
medial amygdala (MeA), a region that receives VNO input by way 
of the accessory olfactory bulb’*"”. cFos responses were not observed 
in Trpc2-'~ mice (Fig. 2f), or in other amygdala regions that receive 
olfactory input (Extended Data Fig. 6). cFos responses were enriched in 
the postero-ventral MeA (Fig. 2g), which sends projections to hypotha- 
lamic areas that control defensive and reproductive responses””’. 

These findings indicate ESP22 to be a juvenile chemosignal that 
activates a VNO response pathway. However, a role for the VNO in regu- 
lating adult-juvenile social interactions is unknown. Trpc2-‘~ mice pro- 
vide a valuable tool for VNO loss-of-function studies, and show severe 
deficits in sex recognition’. Here, we introduced Trpc2*’* or Trpc2-'~ 
males to juveniles and monitored social behaviour. 

Surprisingly, we observed that Trpc2'~ mice displayed a striking in- 
crease in sexual behaviour towards prepubescent females (Supplemen- 
tary Videos 1 and 2). Although Trpc2*’* mice showed rare mounting 
attempts towards juvenile females, Trpc2./~ mice showed vigorous 
mounting behaviour quantified as increases in mean mounting attempts 
and the percentage of animals mounting in 3 and 10 min, as well as 
decreases in mounting latency and intermount interval (Fig. 3 and 
Extended Data Fig. 7a). A similar percentage of Trpc2*'* males showed 
mounting behaviour by 30 min, but these mounts were rare and did not 
increase in frequency during the trial duration (Fig. 3d and Extended Data 
Fig. 7b). In contrast, the sexual behaviour of Trpc2 ’~ and Trpc2*/* males 
towards adult females was similar, as reported previously’. Trpc2‘~ 
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Figure 3 | Trpc2~/~ males display increased sexual behaviour towards 
juveniles. a, Raster plots depicting individual mounting displays of adult 
Trpc2*/* and Trpc2 ’~ males (n = 12) towards female juveniles (C57BL/6, 
2-3 weeks old) during behavioural testing (30 min). Each tick indicates onset of 
one mount. b, c, Quantitative analysis of parameters associated with sexual 
behaviour towards juvenile and adult females shown by Trpc2*/* and Trpc2‘~ 
males (mean + s.e.m., *P < 0.05, **P < 0.01, Mann-Whitney U-test). 


mice showed sexual behaviour towards juvenile females even when pre- 
sented simultaneously with adult oestrous females (Extended Data Fig. 7c), 
and also showed increased sexual behaviour towards juvenile males 
(Extended Data Fig. 8). On the basis of these findings, VNO signalling 
normally prevents mating advances towards young, and one mecha- 
nism probably involves detection of chemosignals released from juvenile 
animals. 

We reasoned that ESP22 is an excellent candidate to function as 
such a mating inhibitor based on the timing of its expression, the role 
of another ESP as a pheromone”’ and the ability of ESP22 to activate 
both VNO sensory neurons and central limbic regions. ESP22 is juven- 
ile-enriched in several strains of mice, but we identified two strains 
(C3H and CBA) that lacked juvenile ESP22 expression (Fig. 4a). These 
mouse strains provided valuable tools for controlling ESP22 levels 
during social interactions, and we observed increased sexual behaviour 
of wild-type males towards C3H and CBA juveniles (Fig. 4b). 

We asked whether painting recombinant ESP22 onto C3H juveniles 
blocked male sexual approaches. We observed that males displayed 
similar levels of sexual behaviour towards unpainted, ESP6-painted 
and MBP-painted C3H juveniles (Fig. 4d). However, males showed a 
significant reduction in mounting attempts and an increase in mount- 
ing latency towards C3H juveniles painted with ESP22 (1 ng). Higher 
ESP22 amounts (10 tg) caused a striking 70-fold reduction in mounting 
attempts towards C3H juveniles, with most animals (10/11) failing to 
showa single mating attempt during the entire 30 min trial (Fig. 4c, d). A 
dose-dependent analysis indicated that amounts of ESP22 derived from 
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Figure 4 | ESP22 inhibits male sexual behaviour. a, ESP22 levels in lacrimal 
gland from mouse strains and ages indicated (n = 5-12, averages + s.e.m.). 
b, Sexual behaviour of wild-type males towards juveniles from strains indicated 
(n= 11 or 12, mean = s.e.m.). c-g, Raster plots and quantification of sexual 
behaviour shown by wild-type males (d-f) or Trpc2 ’~ males (f) towards C3H 
juveniles (d-f) or C57BL/6 oestrous females (g) painted with ESP6 (10 1g), 
ESP22 (10 ug or indicated) or MBP (4 mg) (n = 9-12, averages + s.e.m.). 


Arrow depicts ESP22 concentration in C57BL/6 juvenile tears. h, Model for 
ESP22 signalling. *P < 0.05, **P < 0.01, Student’s one-tailed t-test (a), one- 
way (b, d, e, g) or two-way (f) analysis of variance (ANOVA) followed by 
Tukey’s honestly significant difference (HSD) post hoc tests. 


small quantities of juvenile tears (<200-333 nl) were sufficient for 
inhibition of adult male sexual behaviour (Fig. 4e). ESP22 was not aver- 
sive, as ESP22 painting did not affect social interaction time (Extended 
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Data Fig. 9). ESP22 also did not inhibit sexual behaviour of Trpc2-/~ 
males, consistent with a role for vomeronasal circuits in mediating ESP22 
responses (Fig. 4f). Interestingly, Trpc2 ‘~ males did not show further 
increases in sexual behaviour towards C3H juveniles (Fig. 4f), suggesting 
that C3H juveniles do not release other VNO-dependent mating inhibi- 
tors. However, ESP22 did inhibit the sexual behaviour of C3H adult males, 
which presumably have encountered little or no ESP22 previously 
(Extended Data Fig. 10). Finally, recombinant ESP22 also decreased sexual 
behaviour towards adult females in oestrous (Fig. 4g). Lower levels of 
sexual behaviour persisted towards ESP22-painted oestrous females, sug- 
gesting that oestrous females release other signals that counteract ESP22. 
On the basis of these findings, ESP22 is a juvenile pheromone that blocks 
sexual behaviour through the vomeronasal system (Fig. 4h). 

Behavioural responses to ESP22 differ from responses to other VNO 
activators, such as pheromones and predator odours that trigger mat- 
ing, aggression and fear'*'*’?”*, These findings are consistent with the 
existence of parallel subcircuits of the accessory olfactory system, 
which selectively channel sensory inputs to enable proper selection 
of a behavioural display”®. Identifying a collection of VNO activators 
that regulate different instinctive behaviours provides a valuable tool- 
box to understand how a sensory system controls behaviour. 


METHODS SUMMARY 


All animal procedures were in compliance with institutional animal care and use 
committee guidelines. Full details of experimental procedures for qPCR analysis, 
RNA in situ hybridization, western blot analysis, mass spectrometry, recombinant 
proteins, electrophysiology, cFos staining, behaviour analysis and statistical ana- 
lysis are provided in Methods. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Animals. All animal procedures were in compliance with institutional animal care 
and use committee guidelines. Mice from various strains, as well as castrated and 
ovariectomized animals, were obtained from Jackson Laboratory and Charles 
River Laboratories unless otherwise noted. Trpc2-/— mice were provided by 
R. Axel. C3H mice refer to the strain C3H/He. Oestrous was induced in ovariec- 
tomized adult females (C57BL/6, 10-12 weeks) by timed injection of oestradiol 
benzoate (Sigma-Aldrich, 10 jig sesame seed oil, subcutaneous injection) 48 and 
24h before testing, and progesterone (Sigma-Aldrich, 500 wg in sesame seed oil, 
subcutaneous injection) 3 h before testing. 

qPCR analysis. cDNAs from the extraorbital lacrimal gland and other tissues were 
prepared from animals of ages and sexes indicated using published protocols”. 
Copy number, unless otherwise indicated, refers to abundance in cDNA derived 
from 50 ng of RNA, with absolute values determined in control PCR reactions 
involving plasmid titrations. qPCR primers were verified not to cross react with 
closely related ESPs and ABPs (>60% identity) based on control reactions invol- 
ving ESP- and ABP-encoding plasmids (Extended Data Fig. 2a). (Esp22 forward: 
5'-GTCCCGGAATCTGTTATCCA-3’; Esp22 reverse: 5'-CAGCAATGCTCAC 
TGAAGGA-3’. Esp15 forward: 5'-AACAGGAGCTGCTCTGAATTA-3’; Esp15 
reverse: 5’-GCCTATGACAGAGCCACTTA-3’. Esp16 forward: 5'-TCTGTGTC 
TCATGCACTGCTTCCT-3’; Esp16 reverse: 5'-GGAAGTATTGTTGGAAACA 
CCAGAAA-3’. Esp6 forward: 5'-TCCTTGGTCCTGAGATTGCT-3’; Esp6 re- 
verse: 5’-TTTGCTCACCAACCCAACCA-3". Abp-$27 forward: 5’-GGTGGAAA 
TAGGCTAGCTCTGA-3’; Abp-f27 reverse: 5'-GGGTTCCAGAAGTATATTTC 
TTATA-3’. Abp-f2 forward: 5'-AGCATGCATACCTTTCTTCGGCGTA-3’; Abp- 
2 reverse: 5'-TGCATTCTGAGCTGAAGAGTATAGTTGT-3’.) Different primers 
were used in PCR reactions described in Fig. 1c (Esp22 forward: 5’- ATGAATTCT 
GTCCCAGTCATG -3’; Esp22 reverse: 5’-TCAAGTATTTGTCAAAAGGCGT-3’), 
and specific amplification of the Esp22 gene was verified by DNA sequencing. 
RNA in situ hybridization. In situ hybridization analysis of lacrimal gland tissue 
was performed using established techniques involving colorimetric visualization” 
or multicolour fluorescence*’. CRNA riboprobes were used for Esp22 (full coding 
sequence plus 500 base pairs of the 3’ untranslated region), Esp24 and Esp15 (full 
coding sequence), and Rab3D (926 base-pair sequence amplified by primers 
TTCCGCTATGCCGATGACTC and TGACAACTTCAGCCAGCGAT). The 
Esp22 riboprobe shares less than 75% identity with other Esp genes, a level of 
identity below what typically results in cross-hybridization under high stringency 
conditions used. Furthermore, Esp22 is most closely related to Esp24, and we did 
not observe cross-hybridization between these genes (Fig. 1d). Images were taken 
on a Nikon 80i upright microscope for colorimetric images, and on a Leica TCS 
SP5 II confocal microscope for fluorescent images. 

Western blot analysis. Anesthetized mice were injected with pilocarpine (Sigma- 
Aldrich, 0.5 tg per g body weight, intraperitoneally) and tear fluid was collected 
using microcapillary pipettes. Proteins in tear fluid were separated by electrophor- 
esis using 16.5% Tris-Tricine Gels (Biorad), transferred to PVDF membranes 
(Immobulin) and incubated with a rabbit polyclonal antibody raised against 
ESP22 (amino acids 51-63: CRRLRDVPESVIH, New England Peptide, 1:500, 
24-48 h, 4°C). Bound antibody was detected with a donkey anti-rabbit 800 In- 
frared Dye (Odyssey, 5,000:1, 45 min, room temperature). Blots were analysed 
using a Quantitative IR Western Blot Detection LI-COR (Odyssey) and the Li- 
COR Quantitative Gel Documentation and Blot Detection Software (Odyssey). 
Mass spectrometry. Tear fluid was collected and separated by gel electrophoresis 
as described above. Excised gel bands containing approximately 3-14 kDa pro- 
teins were then subjected to a modified in-gel trypsin digestion procedure”. 
Samples were loaded by a Famos auto sampler (LC Packings) onto a nano-scale 
reverse-phase high-performance liquid chromatography capillary column”, and 
eluted using a gradient of increasing acetonitrile containing 0.1% formic acid. 
Eluted peptides were subjected to electrospray ionization and analysed by an 
LTQ-Orbitrap mass spectrometer (Thermo Fisher). Eluted peptides were isolated, 
including those corresponding to m/z 612.4 and 446.3 (the +2 charge states of the 
ESP22 tryptic peptides DVPESVIHISK and GIVENTIK), and fragmented to pro- 
duce a tandem mass spectrum of specific fragment ions for each peptide. Peptide 
sequences were determined using Sequest (ThermoFinnigan)”. 

Recombinant proteins. A gene encoding the secreted form of ESP22 (Ala23-End) 
was cloned into pMAL-c5x bacterial expression vector (New England Biolabs) 
using SacI and BamHI restriction sites. ESP22 was expressed and purified as a 
fusion protein with maltose-binding protein (MBP) in BL21(DE3) cells following 
the manufacturer’s protocols (pMAL Protein Fusion & Purification System, New 
England Biolabs). Protein was eluted from an amylose affinity resin using maltose 
and concentrated using a centrifugal filter unit (Millipore). The ESP6 coding 
sequence was subcloned into the expression vector pET-28a (Novagen) and puri- 
fied as described previously’. 


Electrophysiology. EVG, EOG and extracellular recordings were performed as 
described previously with minor modifications'**°*". To prevent dialysis of intracellular 
components, action-potential-driven capacitive currents were recorded in ‘loose-seal’ 
cell-attached configuration (seal resistance 30-150 MQ) from vomeronasal sensory 
neuron somata located deep in the sensory epithelium’s basal layer close to the base- 
ment membrane. Spikes were analysed using Igor Pro functions (SpAcAn, G. Dugué 
and C. Rousseau). Inter-stimulus intervals were 30 s. Neuronal responses were classified 
according the following criteria: (1) discharge was time-locked to stimulus presentation 
(responses occurred during and/or up to 3 s after stimulation onset); (2) spike patterns 
clearly deviated from previous baseline activity (frequency histograms (1 s bin width) 
were calculated over repeated trials and responses were evaluated according to a 
Af= 26 forasetine) Criterion). MBP and ESP22 evoked TRPC2-independent EVG and 
EOG responses at approximately 100-fold higher concentrations (data not shown). 
cFos staining. Sexually naive males (Japan SLC, C57BL/6, 9-11 weeks old) were 
housed individually (17 cmX 25 cm Plexiglas test chambers, 12h light/12h dark 
cycle). Stimuli included ESP22 (250 1g), tear fluid (containing 50 jig protein) or 
MBP (200 pg) in 20mM Tris-HCl (pH 7.5, 100 ul) transfused onto a piece of 
cotton (30 mg) and dried in a Speed Vac (3 h). High concentrations of ESP22 were 
necessarily used for cFos studies, as this non-volatile stimulus is poorly investi- 
gated when presented in isolation. Stimuli were placed on bedding during the dark 
phase (90 min), and all mice were observed to investigate the stimulus during 
testing. Mice were then anaesthetized with pentobarbital sodium and perfused 
quickly. Brains were removed and post-fixed in 4% paraformaldehyde in PBS (3 h, 
4°C) and cryoprotected in 15 and 30% sucrose solutions in PBS (4 °C). Immu- 
nohistochemistry and quantification of cFos-positive nuclei were performed as 
described previously’. MeA regions were defined using established anatomical 
landmarks (Extended Data Fig. 6), comparison with a reference image (Bregma - 
1.58 mm)” and Lhx9 staining” (data not shown). 

Behaviour. Before experiments, sexually naive adult males (2-4 months old, C57BL/6 
Trpc2’* or Trpc2-’) were maintained under a reverse light cycle for 2 weeks and 
individually housed for at least 24h. Behavioural testing occurred in the home cage 
with the food tray removed more than 3 h after onset of dark phase. A sexually naive 
male or female (17- to 18-day-old juvenile or adult in oestrous) was introduced to the 
male, and interaction behaviour was recorded for 30 min using a digital camcorder 
compatible with low light conditions (Sony). For some experiments, females were 
painted by swabbing stimulus (100 ul) on the back (50 jl), head (25 ul) and anogenital 
region (25 ul) before testing. Mounting behaviour was defined when males used both 
forepaws to climb onto a female for copulation, and parameters associated with 
mounting behaviour were analysed using Matlab (Mathworks). In rare cases 
(<5%), juvenile pups showed stimulus-independent escape behaviour and were 
excluded from analysis. Animals were randomly assigned to different testing condi- 
tions. For Fig. 4c-e, quantification was performed blind to experimental conditions. 
Statistical analyses. All samples represent biological replicates. Sample sizes for 
biochemistry, electrophysiology, cFos and behaviour met or exceeded the standards 
in the field. In Fig. 1b, sample sizes (n) for data points reading left to right are as follows: 
12, 7,7, 8, 6, 13 for Esp22; 12,7, 8, 8, 6, 12 for Esp6; 8, 5, 5, 7, 6, 12 for Esp15; 8, 8, 10, 8, 4,8 
for Abp-[27; 11,8, 10, 8, 6, 10 for Abp-f2; and 8, 5, 6, 6, 5, 10 for Esp 16. In Fig. 4a, sample 
sizes reading left to right are 13, 10, 6, 6, 5,7, 5 and 7. In Fig. 4b, sample sizes reading left 
to right are 12, 9, 9 and 14. In Fig. 4d, sample sizes reading left to right are 14, 12, 11 and 
9. In Fig. 4e, sample sizes reading left to right are 14, 12, 12, 12, 11, 11 and 11. In Fig. 4f, 
samples sizes reading left to right are 12, 12 and 11. In Fig. 4g, sample sizes reading left 
to right are 12, 11 and 11. Categorical data were analysed by a Fisher's exact test. Other 
reported P values were calculated using a one-tailed Student’s t-test (qPCR, cFos), 
Mann-Whitney U-tests (mouse behaviour) or one- or two-way ANOVA followed 
by Tukey’s HSD post hoc tests (mouse behaviour), as indicated in the figure legends. 
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Extended Data Figure 1 | RNA in situ hybridization to characterize Rab3D (green). CRNA riboprobes for Esp15 are expected to cross-hybridize 
expression of Esp genes in the lacrimal gland. a-c, Colorimetric analysis in with Esp 16 mRNA. Some images used in b are identical to panels in Fig. 1d, and 
tissue from animals indicated using cRNA riboprobes for (a) Esp15 and are included for reference. Dashed boxes (c) indicate regions magnified below. 


(b) Esp22, and two-colour fluorescence analysis (c) in juvenile lacrimal gland Arrows, acinar cells; arrowheads, ductal cells; scale bars, 100 [um (a, b, ¢ top), 
with cRNA riboprobes for Esp22 (red) and a marker for acinal secretory cells, 20 j1m (c bottom). 
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Extended Data Figure 2 | qPCR analysis of gene expression. a, Esp22 GPCR 
primers specifically detect a plasmid containing cloned Esp22, but not plasmids 
containing other Esps with greater than 60% identity to Esp22. b-f, cDNA was 
derived from lacrimal gland (b-e), submaxillary gland (e) or other tissues (f) of 
animals indicated. In f, abundance is calculated by normalization to amounts of 
Gapdh. C57BL/6 mice were used (b-d) unless otherwise indicated (b). 
Experiments where sex is not indicated involved equal numbers of males and 
females; olfactory epithelium (OE), olfactory bulb (OB), harderian gland (HG), 
submaxillary gland (SMG), parotid gland (PG), sublingual gland (SLG) (n = 6-12, 
averages + s.e.m., **P < 0.01, two-way ANOVA followed by Tukey’s HSD post 
hoc tests). 
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Extended Data Figure 3 | Quantification of protein concentrations in tear 
fluid by western blot analysis using an anti-ESP22 antibody. a, b, A standard 
curve based on signal intensity was generated using different concentrations of 
recombinant ESP22 (a, left panel; b). The arrow indicates the intensity level of 
the band in the juvenile tear sample (a, right panel). c, Entire western blot 
analysis of tear fluid using anti-ESP22 antibody. 
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Extended Data Figure 4 | ESP22-derived tryptic peptides identified by mass 
spectrometry. a, The amino-acid sequence of immature ESP22 is depicted, 
along with a predicted signal peptide and the epitope used for antibody 
generation. Four tryptic peptides were identified by mass spectrometry 
(highlighted in red), including one peptide containing the first amino acid after 
the predicted signal sequence and another containing the encoded carboxy 
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(C)-terminal residue. Trypsin does not efficiently cleave amino (N)-terminal 
lysines or arginines, consistent with R23 being the first amino acid in mature 
ESP22. b, Mass spectrum of an high-performance liquid chromatography 
fraction of juvenile tear fluid showing the ESP22-derived tryptic peptide 


GIVENTIK, with sequence identity confirmed by tandem mass spectrometry 
analysis. 
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Extended Data Figure 5 | Electrophysiological responses to ESP22 in VNO __ reproducibility of responses. b, The percentage of basal VNO sensory neurons 
sensory neurons. a, Single-unit extracellular loose-seal recording froma single responsive to 20 pM (n = 383) and 2 nM (n = 749) ESP22. 
VNO sensory neuron repeatedly exposed to different stimuli indicates 
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Extended Data Figure 6 | cFos responses to ESP22 in the amygdala. 

a, ESP22 and juvenile tear fluid, but not MBP, induce cFos expression in the 
postero-ventral MeA. Dashed lines and arrows indicate boundaries of MeA 
regions. b, Similar responses were not observed in other amygdala nuclei that 
receive olfactory input, including the postero-medial cortical amygdala 
(PMCo), anterior cortical amygdala (CoA) and postero-lateral cortical 
amygdala (PLCo) (mean + s.e.m., n = 3). 
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Extended Data Figure 7 | Trpc2~’~ males show increased sexual behaviour 
towards wild-type juveniles. a, b, Histograms of mounts by minute of social 
interaction and intermount intervals shown towards juveniles byTrpc2*’* and 
Trpc2‘~ males (sum, n = 12). Inset depicts average intermount intervals 

(mean ~ s.e.m., *P < 0.05, **P < 0.01, Mann-Whitney U-test). c, Analysis of 
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adult male sexual behaviour during simultaneous interaction with juvenile and 
adult oestrous females. Trpc2'’* and Trpc2 ’~ males show similar amounts of 
sexual behaviour towards adult oestrous females, but Trpc2 / ~ males show 
increased sexual behaviour towards juveniles (n = 10, averages ~ s.e.m., 

*P < 0.05, **P < 0.01, one-way multivariate ANOVA). 
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Extended Data Figure 8 | Trpc2~’~ males show sexual behaviour towards 

juvenile males. a, Raster plots depicting individual mounting displays of adult 
Trpc2*’* and Trpc2‘~ males towards juvenile males (C57BL/6, postnatal day 
17) during social interaction (30 min). Each tick indicates onset of one mount. 


b, Quantitative analysis of parameters associated with sexual behaviour towards 
juvenile males shown by Trpc2*/* and Trpc2’~ males (n = 11 or 12, 
averages + s.e.m., *P < 0.05, **P < 0.01, Mann-Whitney U-test). 
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Extended Data Figure 9 | ESP22 did not decrease social investigation time. 
Wild-type C57BL/6 males were introduced to C3H juvenile females painted 
with stimuli indicated. Social investigation time of the male was recorded as 
time spent with the nose in direct contact with the female. These data were 
extracted from the same experiments reported in Fig. 4c, d, with additional 
experiments involving TMT (100 ul, 155 mM, n = 11 or 12, averages + s.e.m., 
**P < 0.01, one-way ANOVA followed by Tukey’s HSD post hoc tests). 
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Extended Data Figure 10 | ESP22 (10g) inhibits sexual behaviour of C3H 
males. a, Raster plots of sexual behaviour shown by C3H males towards C3H 
juvenile females (postnatal day 17) painted with indicated stimuli (30 min 
social interaction). Each tick indicates onset of one mount. b, Quantitative 


analysis of parameters associated with sexual behaviour towards juvenile 
females shown by C3H males (n = 11, averages + s.e.m., *P < 0.05, 
**P < 0.01, one-way ANOVA followed by Tukey’s HSD post hoc tests). 
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Diabetic hyperglycaemia activates CaMKII and 
arrhythmias by O-linked glycosylation 


Jeffrey R. Erickson’, Laetitia Pereira’, Lianguo Wang', Guanghui Han’, Amanda Ferguson’, Khanha Dao!, Ronald J. Copeland’, 
Florin Despa’“, Gerald W. Hart®, Crystal M. Ripplinger! & Donald M. Bers! 


Ca’*/calmodulin-dependent protein kinase II (CaMKII) is an enzyme 
with important regulatory functions in the heart and brain, and its 
chronic activation can be pathological. CaMKII activation is seen in 
heart failure, and can directly induce pathological changes in ion 
channels, Ca?* handling and gene transcription’. Here, in human, 
rat and mouse, we identify a novel mechanism linking CaMKII and 
hyperglycaemic signalling in diabetes mellitus, which is a key risk 
factor for heart” and neurodegenerative diseases**. Acute hypergly- 
caemia causes covalent modification of CaMKII by O-linked N- 
acetylglucosamine (O-GlcNAc). O-GlcNAc modification of CaMKII 
at Ser 279 activates CaMKII autonomously, creating molecular memory 
even after Ca”* concentration declines. O-GlcNAc-modified CaMKII 
is increased in the heart and brain of diabetic humans and rats. In 
cardiomyocytes, increased glucose concentration significantly enhances 
CaMKII-dependent activation of spontaneous sarcoplasmic reticu- 
lum Ca’* release events that can contribute to cardiac mechanical 
dysfunction and arrhythmias’. These effects were prevented by phar- 
macological inhibition of O-GlcNAc signalling or genetic ablation 
of CaMKII. In intact perfused hearts, arrhythmias were aggravated 
by increased glucose concentration through O-GlcNAc- and CaMKII- 
dependent pathways. In diabetic animals, acute blockade of O-GlcNAc 
inhibited arrhythmogenesis. Thus, O-GlcNAc modification of CaMKII 
is a novel signalling event in pathways that may contribute critically to 
cardiac and neuronal pathophysiology in diabetes and other diseases. 

Under basal conditions, CaMKII is autoinhibited by an interaction 
between the regulatory and catalytic subunits of each CaMKII monomer 
(Fig. 1a). Ca**/calmodulin (Ca**/CaM) binding to the regulatory 
domain disrupts autoinhibition, opening the structure to allow the 
catalytic domain to phosphorylate targets’. This conformational change 
is also the basis for fluorescence resonance energy transfer (FRET) 
changes in a CaMKII activity reporter called Camui, which uses full- 
length CaMKII and attached green fluorescent proteins®’ (Fig. 1a). 
Open-state CaMKII is subject to post-translational modifications, inclu- 
ding phosphorylation at T286 (ref. 8) and oxidation at the MM280/ 
281 pair’, which stabilize CaMKII in the open state even when Ca?*/ 
CaM dissociates, creating molecular memory but also having poten- 
tially pathological effects’. We tested whether diabetic hyperglycaemia 
might alter CaMKII activity. 

Using Camui as a direct CaMKII activity reporter, cells exposed to 
glucose-free or low-glucose (100 mg dl '; 5.5 mmol 1~') conditions 
did not exhibit autonomous CaMKII activity (in lysates plus Ca**/ 
CaM and EGTA buffer) (Fig. 1b, white bars). However, glucose levels 
corresponding to borderline or severe diabetes (240-500 mgdl') 
induced robust autonomous CaMKII activation. The non-metabolizable 
sugar mannitol did not activate autonomous CaMKII activity (Extended 
Data Fig. 1a). Glucose-dependent CaMKII activation was still present 
in CaMKII mutants lacking critical autophosphorylation and oxida- 
tion sites (Extended Data Fig. 1b, c), ruling out involvement of those 
pathways. 
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Figure 1 | Glucose-induced CaMKII activity is O-GlcNAc dependent. 
a, Schematic of Camui CaMKII sensor (numbers are amino acid sequence 
numbers; Assn is the association domain of CaMKII). b, Direct (Ca”*/CaM) or 
autonomous (+EGTA) activation of Camui measured in lysates after HEK-cell 
exposure to indicated glucose concentration. Fcrp/Fyxp, ratio of fluorescence 
of cyan fluorescent protein (CFP) to fluorescence of yellow fluorescent protein 
(YFP). WT, wild type. c, $279 mutation does not affect Ca**/CaM, 
phosphorylation, or oxidation-dependent activity. d, Glucose-dependent 
Camui activation is ablated by KN-93 or Ca** chelation (+EGTA). 
e, Increased glucose concentration enhances pacing- or Iso-induced CaMKII 
activity. f, Glucose-dependent CaMKII activity measured by **P incorporation. 
g, Glucose-dependent Camui activation is enhanced by Thm-G and ablated by 
O-GlcNAc inhibition (+DON) (n = 3 for all data points). Data show 
mean = standard error of the mean (s.e.m.); experiments are with three 
preparations done in triplicate, unless indicated. *P < 0.05 versus control. 
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Post-translational modification by O-GlcNAc (‘O-GlcNAcylation’) 
can alter protein function”®, and such regulation is seen in heart'"'* and 
brain proteins’*"'°. O-GlcNAcylation is enhanced by elevated glucose 
concentration, which raises levels of the direct substrate (uridine dipho- 
sphate (UDP)-N-acetylglucosamine) of the enzyme O-GlcNAc transfer- 
ase (OGT). O-GlcNAc groups are removed by the enzyme O-GlcNAcase. 
We tested whether direct O-GlcNAcylation might mediate glucose- 
induced autonomous CaMKII activation, analogous to autophosphory- 
lation in the conserved CaMKII regulatory domain (Extended Data 
Fig. le). Two consensus O-GlcNAcylation sites are T286 and S279. 
T286A-mutant Camui only slightly limited glucose-induced autonomous 
activation (Extended Data Fig. 1b), but that could be indirect, through 
synergy between O-GlcNAcylation at another site enhancing T286 
autophosphorylation. 

Remarkably, $279A-mutant Camui abolished glucose-induced auto- 
nomous CaMKII activation (Fig. 1b, black bars). Importantly, $279A had 
no effect on either direct CaMKII activation or on autonomous activity 
induced by autophosphorylation or oxidation (Fig. 1c). Thus, $279 
may bea specific target for O-GlcNAc-mediated CaMKII activity during 
hyperglycaemia. 

High glucose did not alter the CaMKII activation state in cells kept 
in Ca~* -free, EGTA-containing conditions (Fig. 1d). When cells were 
exposed to elevated glucose (and normal Ca” ~), the subsequently mea- 
sured maximal Ca*”/CaM-dependent activity was enhanced (Fig. 1d, 
middle bars). Pre-treatment with the CaMKII inhibitor KN-93 (which 
locks CaMKII in the closed high-FRET state) prevented autonomous 
activation by high glucose, even in the presence of Ca”*/CaM. Rat 
cardiomyocytes expressing Camui and exposed to high glucose (without 
stimulation) for 24 h showed no significant change in baseline CaMKII 
activation versus low-glucose myocytes (Fig. le). However, increasing 
intracellular [Ca”*], either by pacing (0.5 Hz for 30 s) or isoprenaline 
(Iso; 100 nM for 20 min) yielded significantly greater CaMKII activa- 
tion at higher glucose concentrations. These observations suggest that 
O-GlcNAcylation of CaMKII is analogous to autophosphorylation and 
oxidation, requiring initial opening via Ca” */CaM. 
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To confirm our Camui observations, we cultured rat myocytes for 
24h in varying glucose concentrations and measured autonomous 
CaMKII activity (+ Iso) using a standard assay (?P incorporation 
into a CaMKII substrate; Fig. 1f). CaMKII activity was increased by 
glucose >200 mg dl~', and by combined Iso and high glucose. O- 
GlcNAcylation is dynamic in cells and limited by glucose availability’® 
and the enzymatic functions of OGT’® and O-GlcNAcase’’. Specific 
inhibition of glutamine-fructose amidotransferase by 50 uM diazo-5- 
oxonorleucine (DON) to prevent production of the OGT substrate 
(Fig. 4g), and hence O-GlcNAcylation, abolished glucose-induced auto- 
nomous CaMKII activation (Fig. 1g). Conversely, inhibition of O- 
GlcNAcase with 100nM thiamet-G (Thm-G) promotes O-GlcNAc 
modification and enhanced myocyte CaMKII activity in conditions of 
elevated glucose concentration. Mutant-S279A Camui was not appre- 
ciably activated by high glucose in intact cells (Extended Data Fig. 1d). 
Thus, glucose-induced CaMKII activity involves $279 and an O-GlcNAc- 
dependent pathway. 

To determine the extent of CaMKII O-GlcNAcylation in heart and 
brain, we used a custom-designed antibody that specifically recognizes 
this modification’®. The fraction of CaMKII that was O-GlcNAc-modified 
and autophosphorylated was increased in rat myocytes cultured in 
high relative to normal glucose (350 versus 150 mg dl _', Fig. 2a), con- 
firming that high glucose induces O-GlcNAc modification and increased 
activation of CaMKII. O-GlcNAc modification of CaMKII was blocked 
by KN-93 and in the $279A mutant (Fig. 2a), whereas it was enhanced 
by treating myocytes with 100 nM Iso 20 min before lysis (Extended 
Data Fig. 2a). We verified that the antibody reacted specifically to 
O-GlcNAc by immunoblots before and after B-elimination reactions 
that specifically cleaved O-linked glycans without degrading proteins 
(Extended Data Fig. 2b). The O-GlcNAc antibody no longer recognized 
high-glucose-treated CaMKII after B-elimination, but CaMKII levels 
were unaltered. O-GlcNAc modification of CaMKII was also disrupted 
by 50 uM DON and enhanced by 100nM Thm-G (Extended Data 
Fig. 2c). We subjected peptides encoding the regulatory domain of 
CaMKII to in vitro labelling with O-GlcNAc transferase and confirmed 
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Figure 2 | O-GlcNAcylation of CaMKII occurs in vivo. a, Immunoblot (IB) 
with O-GlcNAc-specific and CaMKII-phospho-T286-specific (P-T286) 
antibodies show that high glucose concentration (350 mg dl ') increases 
O-GlcNAcylation and activation of CaMKII, but not in $279A-mutant CaMKII 
or after KN-93 treatment (n = 3 myocyte preparations). IP, 
immunoprecipitate. b, The ratio of O-GlcNAc-modified to total CaMKII is 


increased in heart (n = 6 hearts per group) and brain (n = 3 brains per group) 
from diabetic versus control non-diabetic human patients. Ctrl, control. HF, 
heart failure. c, O-GlcNAc modification of CaMKII is also increased in heart 
and brain from diabetic rats compared with wild-type controls (number of rats 
indicated). Data are shown as mean + s.e.m. *P < 0.05, **P < 0.01 versus 
control. 
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the S279 site as a target for O-GlcNAc modification using electron- 
transfer-dissociation mass spectrometry (ETD-MS; Extended Data Fig. 3). 
CaMKII expression is increased in patients with heart failure'’, and 
elevated CaMKII expression and activity have been implicated in the 
transition to heart failure’’*°. Using the O-GlcNAc-specific antibody, 
we probed cardiac samples (holding total CaMKII constant) from patients 
with heart failure and diabetes (blood glucose >400 mg dl’), alongside 
failing and non-failing non-diabetic hearts (blood glucose <200 mg dl"). 
The fraction of CaMKII that was O-GlcNAc modified was doubled in 
heart failure patients and nearly tripled in heart failure patients with 
diabetes versus those with non-failing, non-diabetic hearts (Fig. 2b). 
Similarly, brain samples from people with diabetes had significantly 
increased O-GlcNAc-modified CaMKII relative to those from non- 
diabetic people. In a diabetic rat model with reduced insulin secretion 
and blood glucose >600 mg dl~ (ref. 21), CaMKII O-GlcNAcylation 
was greatly elevated in heart and brain samples versus in control rats 
(Fig. 2c). Interestingly, CaMKII autophosphorylation was also enhanced 
in cardiac tissue from diabetic rats (Extended Data Fig. 2d), consistent 
with synergistic CaMKII activation through these mechanisms. Taken 
together, our data demonstrate that CaMKII O-GlcNAcylation and 
activation occur in the heart and brain of diabetic subjects. 
Ryanodine receptor (RyR) phosphorylation by CaMKII enhances 
cardiac sarcoplasmic reticulum Ca’~ release events (Ca** sparks and 
waves)***, In intact isolated myocytes, GlcNAcase inhibition with 
Thm-G or elevated glucose concentration alone increased Ca" spark 
and wave frequency (Fig. 3a—c). The Thm-G- induced Ca*™ spark increase 
occurred without altered sarcoplasmic reticulum Ca" content (Extended 
Data Fig. 4a) and was prevented by the CaMKII inhibitor KN-93, but 
not its inactive analogue KN-92. Combining Thm-G treatment with 
increased glucose concentration (350 mg dl_') markedly increased Ca** 
sparks and waves, consequently depleting sarcoplasmic reticulum Ca”* 
(Extended Data Fig. 5). Thus, hyperglycaemia and reduced GlcNAcase 
activity synergize in activating sarcoplasmic reticulum Ca’ ~ release. Acute 
blockade of either CaMKII (+KN-93) or O-GlcNAcylation (+ DON) 
prevented glucose-dependent Ca*~ sparks (Fig. 3c), but did not alter 
sarcoplasmic reticulum Ca’~ load (Extended Data Fig. 4b). Ca”* spark 
frequency versus control was neither altered by DON (Fig. 3c) nor the 
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Figure 3 | Glucose-induced cardiac Ca”* sparks are O-GlcNAc and CaMKII 
dependent. a, Ca** sparks and waves are increased by GlcNAcase inhibitor 
Thm-G. b, Thm-G-induced Ca?* sparks are prevented by CaMKII inhibitor 
KN-93 (but not the inactive analogue KN-92). c, Glucose-induced Ca’? sparks 
are ablated by CaMKII inhibitor KN-93 and the O-GlcNAc inhibitor DON. 
d, Thm-G induces Ca”* sparks in wild-type (WT) mice, but not mice lacking 
CaMKII5 (CaMKII5-KO; baseline Ca** spark frequency differs in rat versus 
mouse myocytes). Data are shown as mean + s.e.m. The number of myocytes is 
indicated. *P < 0.05, ** P< 0.01 versus control, ***P < 0.05 versus 

350mg dl! glucose. 
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Figure 4 | Glucose-induced arrhythmias are suppressed by DON and KN-93. 
a, Electrocardiograms (ECGs) during baseline conditions (top), high glucose 
concentrations (HG), and DON pre-treatment plus high glucose. b, PVCs 
(per 15 min) increased after glucose concentration was raised (versus baseline). 
DON or KN-93 pre-treatment reduced glucose-induced PVCs, but had no 
effect under baseline conditions. c, Activation maps for normal activation (top) 
and high-glucose-induced PVC (bottom). d, Action potentials (black) and 
Ca’* transients (red) from area indicated by white boxes in c. V,,, membrane 
voltage. e, f, Caffeine- (Caff) and dobutamine- (DOB) induced PVCs and 
ventricular tachycardia (VT) were inhibited by DON in diabetic rats in vivo. 
g, Working model of O-GlcNAc-induced and CaMKII-dependent arrhythmic 
events. DADs, delayed afterdepolarizations; Frc-6-P, fructose-6-phosphate; Ica; 
Ca current; PLB, phospholamban. Data show mean = s.e.m.; n = 3 hearts 
(a-d) or animals (e, f). *P < 0.05 versus control. NS, not significant. 


non-metabolizable sugar mannitol (Extended Data Fig. 4c). Thus, glucose- 
induced arrhythmogenic Ca”* waves occur through a CaMKII- and 
O-GlcNAc-dependent mechanism. 

To test whether CaMKII6 (the dominant cardiac isoform) is required 
for O-GlcNAc-dependent effects on sarcoplasmic reticulum Ca’* release, 
we used myocytes from CaMKII8-knockout mice. Neither Ca** tran- 
sient amplitude nor sarcoplasmic reticulum Ca’* load (Extended Data 
Fig. 4d, e) were altered by acute Thm-G exposure in wild-type or CaMKII0- 
knockout mouse cells. Ca”* spark frequency was significantly enhanced by 
Thm-G in wild-type but not in CaMKIId5-knockout myocytes (Fig. 3d). 

Using optical mapping in Langendorff-perfused rat hearts exposed 
to 400 mg dl’ glucose, we observed a significant increase in prema- 
ture ventricular complexes (PVCs) compared with baseline (Fig. 4a), 
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consistent with observations of enhanced PVCs in human diabetic 
patients”. This effect was attenuated by inhibiting either CaMKII (using 
KN-93) or O-GlcNAc (using DON) (Fig. 4b). We also mapped intracel- 
lular [Ca**] ({Ca?*],) and voltage simultaneously. Epicardial activation 
during PVCs was typified by markedly slowed conduction and activa- 
tion times compared with normal activation (Fig. 4c). Additionally, 
spontaneous diastolic [Ca**]; elevation preceded the action potential 
upstroke in high glucose conditions (Fig. 4d), an effect prevented by 
blocking OGT using DON pre-treatment (Extended Data Fig. 6a, b). 

We also found higher in vivo arrhythmia susceptibility in normal and 
diabetic rats during challenge with caffeine and dobutamine (Fig. 4e, f). 
DON pre-treatment ablated arrhythmias induced by caffeine and 
dobutamine in diabetics, but had no effect on baseline or caffeine/ 
dobutamine-induced arrhythmia in non-diabetic rats. We also confirmed 
that CaMKII activity is elevated in diabetic rat hearts, and that this effect 
is blunted by pre-treatment of rats with DON (Extended Data Fig. 6c), 
consistent with O-GlcNAc- and CaMKII-dependent hyperglycaemia- 
induced arrhythmogenesis. 

We identified a novel mechanism for autonomous CaMKII activa- 
tion by O-GlcNAc modification at CaMKII $279 (Fig. 4g). Acute extra- 
cellular glucose elevation, to levels that mimic those in diabetic patients, 
suffices to activate CaMKII through this pathway in intact cardiac myo- 
cytes and leads to arrhythmic events in intact hearts and animals. In 
diabetic hearts and brains CaMKII O-GlcNAcylation is elevated and 
this may contribute to pathological alterations in cardiac myocytes and 
neurons. Indeed, this pathway may synergize with autonomous CaMKII 
activation by phosphorylation® and oxidation*®, which are important 
in signalling in many cell types. CaMKIV, related to CaMKII, is O- 
GlcNAcylated at $189, which inhibits its activation by CaM kinase 
kinase’’. O-GlcNAc-mediated activation in CaMKII is not analogous 
to the inhibition seen for CaMKIV. 

The S279 site is highly conserved in all mammalian CaMKII isoforms 
(Extended Data Fig. le), and the robust functional effects in cardio- 
myocytes suggest that hyperglycaemia can readily activate CaMKII in 
both heart and brain, and alter phosphorylation of multiple CaMKII 
targets (including CaMKII itself) to exert both acute (for example, 
altered Ca** handling/arrhythmias) and chronic (for example, trans- 
criptional regulation) effects in many tissues. CaMKII is an important 
nodal point in both acute and chronic modulation of ion channels in 
both heart and brain. Overactivation of CaMKII caused by hypergly- 
caemia during diabetes may lead to widespread and as yet unappreciated 
pathological consequences that merit exploration. It is already known 
that overactivation of CaMKII occurs in heart failure and neuronal 
excitotoxicity, and that this activated CaMKII can contribute to major 
dysfunction at the level of acute ion channel modulation that contributes 
to cardiac arrhythmias’, reduced contractility, neuronal damage”’ and 
altered gene transcription’. In diabetes, these powerful CaMKII sig- 
nalling pathways are likely to be activated by hyperglycaemia-induced 
O-GlcNAc modification of CaMKII, and this should be considered in 
future therapeutic strategies. This could also broaden the impact of 
CaMKII inhibitors in therapeutics in heart disease and beyond. 


METHODS SUMMARY 


Camui constructs were generated as previously described’. HEK293 cells were kept 
in culture for 24h and transiently transfected with expression plasmids encoding 
Camui. Cells were cultured for an additional 24h after transfection. Fluorescence 
measurements were performed using fluorescence spectrophotometry, excited at 
440 nm with emission recorded at 477 nm for cyan fluorescent protein (CFP) (Fcrp) 
and 527 nm for yellow fluorescent protein (YFP) (Fyrp). Camui fluorescence and 
ratio (Fcrp/Fypp) was measured with 10 1M CaM plus 200 uM Ca** for maximal 
activity and with 1mM EGTA to chelate Ca”” for assessing deactivated or auto- 
nomous CaMKII activation. CaMKII activity was directly measured as incorpora- 
tion of ’P from ATP-yP* into an artificial substrate as previously described’. Ca** 
transients and sparks were recorded using confocal microscopy. Pooled data are 
represented as mean + s.e.m. Statistical comparisons were made with repeated two- 
way analysis of variance and paired Student's t-test where applicable. P< 0.05 was 
considered significant. 
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METHODS 


Construction of adenoviral vectors encoding biosensors. The Camui construct’ 
was incorporated in adenoviruses using the AdEasy adenoviral vector system 
(Qbiogene) to ensure high infection efficiency in terminally differentiated adult 
ventricular myocytes. Mutant variants of Camui (T286A, CM280/281VV and S279A) 
were generated using the commercially available QuickChange site-directed muta- 
genesis kit (Stratagene), and likewise incorporated into adenovirus. 
HEK293-cell transfection. HEK293 cells (mycoplasma free at the time of this study) 
were cultured in DMEM (Invitrogen) with 5% FBS and penicillin/streptomycin for 
24h, and then transiently transfected with expression plasmids encoding Camui 
using a mammalian transfection kit (Stratagene). Cells were cultured for an addi- 
tional 36 h after transfection. Camui expression was checked by fluorescence micro- 
scopy before experiments. 

Human and rat models of diabetes. Failing hearts from type 2 diabetic and non- 
diabetic patients were obtained at the time of orthotopic heart transplantation as a 
gift from K. Margulies (University of Pennsylvania). Brain samples from human 
temporal cortex were obtained as a gift from L.-W. Jin and M. Melara (University 
of California, Davis). All human specimens, including failing heart tissues and 
brain samples, were obtained in accordance with Institutional Review Board approval 
at the respective institutions where samples were collected. All tissue was obtained 
with informed consent before transplantation surgery. Inclusion in tissue-based 
studies was not restricted on the basis of age, sex, race or ethnic status. Human 
cardiac tissue samples were divided into three groups: a non-failing and non- 
diabetic group (Fig. 2b control: 3 males, 3 females; ages 42-60), a heart failure 
group not under treatment for diabetes and with blood glucose < 200 mg dl * 
(Fig. 2b) heart failure: 3 males, 3 females; ages 41-63), and a heart failure group 
with diagnosed diabetes and blood glucose > 200 mg dl“ (Fig. 2b heart failure/ 
diabetes: 4 males, 2 females; ages 38-66). Human brain tissue samples were divided 
into two groups: a non-diabetic group (Fig. 2b control: 3 females; ages 79-89), and 
a diabetic group with blood glucose > 200 mg dl (Fig. 2b diabetes: 3 females; ages 
65-82). Male Sprague-Dawley (SD) rats transgenic for human amylin in the pan- 
creatic B-cells (HIP rats) were used at age 10-12 months as previously described”*. 
Animal studies were not randomized or blinded for this study. Sample sizes were 
determined by power analysis or based on previous studies with the selected 
models”*. Blood glucose levels in rats were measured 1 day before experiments 
were conducted using a OneTouch Ultra glucose meter (LifeScan; model no. AW 
060-213-01A). All diabetic HIP rats had a blood glucose concentration of over 
600 mg dl’ at the time of death. 

In vitro fluorescence and CaMKII activity assays. Fluorescence measurements 
were performed using an MS SpectraMax plate reader spectrophotometer (Molecular 
Devices). Excitation and emission slits were set at 4nm. An excitation wavelength 
of 440nm was used, and dual photon counting emission detectors were set at 
477 nm (Fcrp) and 527 nm (Fyrp), respectively. HEK cells or rat ventricular myo- 
cytes expressing Camui were treated with 10 14M CaM and 200 uM Ca’*, then 
lysed in a buffer containing 50 mM Tris-HCl (pH 7.5), 5 mM MgCh, and protease 
inhibitors to measure ‘direct’ activation (for example, see Fig. 1b). For autonomous 
Camui fluorescence measurements, Ca?*/ CaM treatment was performed in the 
presence of 100-500 mg dl glucose (Fig. 1b, d, e, g), 100 1M ATP (Fig. 1c) or 
1M H,0O, (Fig. 1c). Cell lysis was then performed in a buffer containing 1 mM 
EGTA to chelate Ca** and isolate the kinase activity attributed to autonomous 
(rather than direct) activation. In Fig. 1d, cells were pretreated with 1 mM EGTA or 
10M KN-92/KN-93. In Fig. le, myocytes were not directly treated with Ca”*/ 
CaM, but instead were subjected to pacing (0.5 Hz) or treated with Iso (100 nM) in 
the presence of 100 or 240 mg dl ' glucose. In Fig. 1g, cells were treated with Ca**/ 
CaM and increasing glucose concentration in the presence of either 50 14M DON or 
100nM Thm-G. In Fig. 1f, CaMKII kinase activity was determined at increasing 
glucose concentrations and in the presence (and absence) of Iso using a kinase 
assay that measures incorporation of **P-ATP into an artificial substrate, syntide-2, 
as previously described? (Fig. 1). 

ETD-MS analysis. The peptide was analysed using an LTQ-Orbitrap XL mass 
spectrometer (Thermo Fisher Scientific) with ETD. The synthetic peptide was 
O-GlcNAcylated by in vitro labelling with OGT overnight at 4°C. After the O- 
GlcNAcylation reaction, the pH was adjusted to approximately 3 using 10% formic 
acid, and the sample was desalted with a C18 spin column (The Nest Group). The 
desalted sample was lyophilized to dryness using a Speed Vac concentrator and re- 
constituted in 50% methanol, 0.2% acetic acid to a final concentration of 10 pmol 
ul” !. The sample was directly infused into the LTQ-Orbitrap XL mass spectro- 
meter at a flow rate of 1 pl min’ using a spray voltage of 1.9 kV. The full MS scans 
were acquired in the FT analyser with the following parameters: resolution 100,000; 
mass scan range mass-to-charge ratio (m/z) 300-800; and microscans 1. The ETD- 
MS2 scans were acquired in the ion-trap analyser with the following parameters: 
reagent AGC target 4 X 10°; mass scan range m/z 250-2,000; microscans 1; isola- 
tion width m/z 1; and ETD activation time 80 ms. 
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Myocyte isolation and adenoviral infection. All protocols involving animals 
were performed in accordance with the Guide for the Care and Use of Labo- 
ratory Animals and approved by the University of California, Davis Institutional 
Animal Care and Use Committee. Adult rat ventricular myocytes were isolated as 
previously described’. Myocytes were seeded on laminin-coated coverslips in 
serum-free PC-1 medium (Lonza) supplemented with penicillin/streptomycin. 
Myocytes were infected for 2h at a multiplicity of infection of 10-100 with ade- 
novirus encoding Camui, followed by replacement with fresh medium. Infected 
cells were kept in culture for 36 h with one final replacement of fresh medium 1h 
before experiments. 

Confocal microscopy imaging. Coverslips were mounted on the stage of an 
inverted microscope (Zeiss, LSM5 Pascal) equipped with a 40 X 1.4 NA water 
immersion objective lens. Argon laser excitation wavelengths were 458 nm for CFP 
and 514nm for YFP. CFP emission fluorescence was measured by confocal micro- 
scopy at 485 + 15 nm, and YFP emission fluorescence was measured at =535 nm. 
Camui imaging experiments were performed as previously described’. ImageJ soft- 
ware was used for image analysis. 

Spark measurements. Intact ventricular myocytes were loaded with Fluo-3 AM 
(5 uM; Molecular Probes) and transients were recorded as previously described”. 
Ca’* transients were obtained by field stimulation at 1 Hz. Sarcoplasmic reticulum 
Ca’* load was evaluated by the Ca** transient upon rapid caffeine application 
(10mM). Experiments were performed with confocal microscopy (BioRad, 
Radiance 2100, 40 objective) using line scan mode with argon 4 laser (excitation 
at 488 nm, emission at >505 nm). Image analysis used Image] software and home- 
made routines in interactive data language (IDL). 

Langendorff-perfused rat hearts. All procedures involving animals were approved 
by the Animal Care and Use Committee of the University of California, Davis and 
adhered to the Guide for the Care and Use of Laboratory Animals published by the 
National Institutes of Health. Adult male Sprague-Dawley rats (250-300 g) were 
anaesthetized with pentobarbital sodium (150 mg kg ~ i intraperitoneal (IP)) con- 
taining 500IUkg ' of heparin. After a midsternal incision, hearts were rapidly 
excised and Langendorff perfused at 37 °C with oxygenated (95% O , 5% COz) 
modified Tyrode’s solution of the following composition (in mmol] '): NaCl 
128.2, CaCl, 1.3, KC] 4.7, MgCl, 1.05, NaH PO, 1.19, NaHCO; 20 and glucose 
11.1 (pH 7.4). Flow rate (6-15 ml min |) was adjusted to maintain a perfusion 
pressure of 60-70 mm Hg. One leaflet of the mitral valve was carefully damaged 
with sharp forceps inserted through the pulmonary vein to prevent solution con- 
gestion in the left ventricular cavity after suppression of ventricular contraction. 
This also prevented acidification of the perfusate and the development of isch- 
aemia in the left ventricle. Two Ag/AgCl disc electrodes were positioned in the bath 
to record an ECG analogous to a lead I configuration. ECG was continuously 
recorded throughout the duration of the experiment. A bipolar pacing electrode 
was positioned on the base of the left ventricular epicardium for pacing, which was 
performed at a basic cycle length (BCL) of 200 ms using a 2 ms pulse width at twice 
the diastolic threshold. 

Dual optical mapping of V,, and Ca”*. Hearts were loaded with the fluorescent 
intracellular Ca** indicator Rhod-2 AM (Molecular Probes; 250 jl of 1 mg ml! 
in dimethylsulphoxide (DMSO) containing 10% pluronic acid) and were subse- 
quently stained with the voltage-sensitive dye RH237 (Molecular Probes; 25 1l of 
lmg ml | in DMSO). Blebbistatin (Tocris Bioscience; 10-20 tM) was added to 
the perfusate to eliminate motion artefact during optical recordings. The anterior 
epicardial surface was excited using LED light sources centred at 530nm and 
band-pass filtered from 511-551 nm (LEX-2; SciMedia) and focused directly on 
the surface of the preparation. The emitted fluorescence was collected through a 
50 mm objective (Nikon) and split with a dichroic mirror at 630 nm (Omega). The 
longer wavelength moiety, containing the V,,, signal, was long-pass filtered at 
700nm, and the shorter wavelength moiety, containing the Ca** signal, was 
band-pass filtered between 574-606 nm. The emitted fluorescence signals were 
recorded using two CMOS cameras (MiCam Ultima-L; SciMedia) with a sampling 
rate of 1 kHz and 100 X 100 pixels with a 20 X 20 mm field of view. The atrioven- 
tricular node was ablated using a fine-tip thermal cautery (Acuderm) to produce a 
slow intrinsic rhythm which allowed for ectopic activity and PVCs to escape. After 
loading the dyes, baseline electrophysiological parameters were recorded during 
normal rhythm as well as left ventricular epicardial pacing at a BCL of 200 ms. 
Hearts were then subjected to hyperglycaemia (400 mg dl~ 1) with (n = 3) or with- 
out (n = 5) pre-treatment (10 min) with the O-GlcNAc inhibitor DON (50 uM). 
Optical recordings were taken every 5 min after treatment and ECG was continu- 
ously recorded. 

In vivo ECG recordings. In vivo experiments were performed in anaesthetized 
diabetic rats (blood glucose >500 mg dl’). Rats received an injection of caffeine 
(IP; 120mgkg') and dobutamine (intravenous; 50 ,igkg ') during in vivo 
experiments. The same individuals were pre-treated (30 min before caffeine/dobu- 
tamine challenge) with an IP injection of DON (5 mgkg’ *). Experiments were 
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done 1 week apart, and some individuals received the reverse of the described 
procedure (+ DON in first trial, -DON in second trial) to control for compensa- 
tion effects between trials. For quantification of arrhythmia scores, the severity of 
arrhythmias was quantified using a previously published scoring system*°. Each 
individual heart was evaluated by means of a 5-point arrhythmia score, where single 
PVCs were given a score of 1, bigeminy/salvos a score of 2, ventricular tachycardia 
a score of 3, ventricular fibrillation a score of 4, spontaneous ventricular fibrillation 
a score of 5, and an assigned number corresponded to the most severe type of 
arrhythmia observed in that heart. Scores were used for group analysis of severity 
of arrhythmias. 

Optical mapping data analysis and statistics. Optical mapping data analysis was 
performed using two different commercially available analysis programs (BV_Analyze, 
Brainvision; and Optiq, Cairn). V,, and Ca”* data sets were spatially aligned and 
processed with a Gaussian spatial filter (radius 3 pixels). For both action potentials 
and Ca?* transients (CaT's), activation time was determined as the time at 50% 
between diastolic and peak amplitude. Diastolic Ca** elevation was measured as 
the percentage of diastolic Ca’ " increase relative to the following CaT amplitude at 
baseline and 30 min post-treatment. The average diastolic Ca** elevation was calcu- 
lated for each heart by averaging all Ca~* signals from the entire anterior surface of 


the heart within the optical mapping field of view. PVC incidence was determined 
from the continuous ECG recording as the number of PVCs that occurred during a 
15 min period of baseline activity (before initiation of treatment) and during the 
first 15 min of treatment. 

All values are presented as mean + s.e.m.; 1 values are generally biological repli- 
cates (hearts, brains, animals, myocytes, cell preparations) as indicated in legends. 
In addition, three technical replicates (triplicates) from three biological replicates 
were used for some cellular Camui experiments in Fig. 1. Comparisons between 
two groups of data were made using a Student’s t-test, paired where appropriate or 
with repeated two-way analysis of variance. P< 0.05 was considered statistically 
significant. 


28. Despa, S. et al. Hyperamylinemia contributes to cardiac dysfunction in obesity 
and diabetes: a study in humans and rats. Circ. Res. 110, 598-608 (2012). 

29. van Oort,R.J.eta/. Ryanodine receptor phosphorylation by calcium/calmodulin- 
dependent protein kinase II promotes life-threatening ventricular arrhythmias in 
mice with heart failure. Circulation 122, 2669-2679 (2010). 

30. Curtis, M. J. & Walker, M. J. Quantification of arrhythmias using scoring systems: 
an examination of seven scores in an in vivo model of regional myocardial 
ischaemia. Cardiovasc. Res. 22, 656-665 (1988). 
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Extended Data Figure 1 | O-GlcNAc effect is not abolished by T286A or 
CM280/1VV mutation and CaMKII regulatory domain contains consensus 
O-GlcNAc modification sites. a, Increased glucose concentration, but not 
osmolarity-matched mannitol, activates CaMKII in HEK cells (n = 9). 

b, O-GlcNAc-dependent CaMKII activation is reduced but still present in 
T286A-mutant Camui (n = 9). WT, wild type. c, Glucose-dependent 
CaMKIJ activation is preserved in CM280/281V V-mutant Camui expressed in 


HEK-cell lysates (n = 9). d, Activation of Camui by increased glucose is blunted 
in the $279A mutant and ablated entirely by DON (n values: wild type = 100, 
wild type + DON = 72, $279A = 57, $279A + DON = 44 cells). 

e, These sites are conserved in all known isoforms of CaMKII and in a wide 
variety of mammalian species. Data are mean + s.e.m. * P< 0.05, **P<0.01 
versus control. 
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Extended Data Figure 2 | O-GlcNAc modification of CaMKII is enhanced _ before immunoblot. c, O-GlcNAc modification of CaMKII is ablated by DON 
in hyperglycaemic conditions. a, Immunoblot with an O-GlcNAc-specific and enhanced by Thm-G. d, Autophosphorylation of cardiac CaMKII is 
antibody shows O-GlcNAc modification of CaMKII is inducible by increased _ significantly increased in a rat model of diabetes. n = 3 for all immunoblots 
glucose availability and is enhanced by Iso treatment ( values indicated). except where indicated. Data are mean + s.e.m. *P < 0.05, **P < 0.01 versus 
b, O-GlcNAc modification of CaMKII is reversed by B-elimination reaction control. 
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Extended Data Figure 3 | ETD-MS analysis confirms O-GlcNAc 
modification at S279A. A synthetic peptide encoding the regulatory domain 
of CaMKII was subjected to in vitro O-GlcNAc labelling followed by ETD-MS 


analysis. Examination of the 507.25 m/z peptide fragment (top right inset) 
indicates the presence of an O-GlcNAc modification at $279 (c6 to c7 


fragmentation). 
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Extended Data Figure 4 | Sarcoplasmic reticulum Ca”* content, sparks and 
twitch Ca?* transients. a, b, Sarcoplasmic reticulum (SR) content is 
unaffected by Thm-G (a) or DON (b) in isolated rat myocytes (n values 
indicated). c, Mannitol does not enhance calcium spark frequency in isolated 
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rat myocytes. d, e, Ca?* transient amplitude (d, n = 13) and SR content 

(e, n = 13) are unaffected by Thm-G treatment in isolated myocytes from wild- 
type (WT) or CaMKII6-knockout mice. Data are mean + s.e.m. NS, no 
significant difference. 
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Extended Data Figure 6 | Diastolic calcium elevation under high glucose is 
suppressed by pre-treatment with 50 mM DON. a, Average diastolic calcium 
elevation at baseline and following treatment with either high glucose (HG) or 
DON plus high glucose. Calcium elevation was measured as the percentage 

increase in the diastolic calcium signal relative to the amplitude of the following 
transient (n = 3). b, Example transients during baseline conditions (black) and 


after treatment with either high glucose or DON plus high glucose (grey). 
Minimal diastolic calcium elevation occurs after pre-treatment with DON. 

n = 3-4 rats for all data points. c, CaMKII activity is enhanced in heart lysate 
from diabetic rats (n = 3), and this effect is blunted by treatment with DON. 
Data are mean + s.e.m. *P < 0.05 versus control. 
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A statin-dependent QTL for GATM expression is 
associated with statin-induced myopathy 


Lara M. Mangravite'*, Barbara E. Engelhardt”*+, Marisa W. Medina®, Joshua D. Smith’, Christopher D. Brown’, 

Daniel I. Chasman®, Brigham H. Mecham', Bryan Howie’, Heejung Shim”, Devesh Naidoo®, QiPing Feng’, Mark J. Rieder*+, 
Yii.-Der I. Chen®, Jerome I. Rotter’, Paul M Ridker®, Jemma C. Hopewell’, Sarah Parish’, Jane Armitage’, Rory Collins’, 
Russell A. Wilke’, Deborah A. Nickerson*, Matthew Stephens”? & Ronald M. Krauss® 


Statins are prescribed widely to lower plasma low-density lipoprotein 
(LDL) concentrations and cardiovascular disease risk' and have 
been shown to have beneficial effects in a broad range of patients*”. 
However, statins are associated with an increased risk, albeit small, 
of clinical myopathy‘ and type 2 diabetes’. Despite evidence for sub- 
stantial genetic influence on LDL concentrations®, pharmacogenomic 
trials have failed to identify genetic variations with large effects on 
either statin efficacy’ or toxicity’’, and have produced little infor- 
mation regarding mechanisms that modulate statin response. Here 
we identify a downstream target of statin treatment by screening for 
the effects of in vitro statin exposure on genetic associations with 
gene expression levels in lymphoblastoid cell lines derived from 480 
participants of a clinical trial of simvastatin treatment’. This analysis 
identified six expression quantitative trait loci (eQTLs) that inter- 
acted with simvastatin exposure, including rs9806699, a cis-eQTL 
for the gene glycine amidinotransferase (GATM) that encodes the 
rate-limiting enzyme in creatine synthesis. We found this locus to be 
associated with incidence of statin-induced myotoxicity in two sepa- 
rate populations (meta-analysis odds ratio = 0.60). Furthermore, 
we found that GATM knockdown in hepatocyte-derived cell lines 
attenuated transcriptional response to sterol depletion, demonstrat- 
ing that GATM may act as a functional link between statin-mediated 
lowering of cholesterol and susceptibility to statin-induced myopathy. 

Analysis of individual variation in transcriptional response to drug 
treatment has successfully identified regulatory genetic variants that 
interact with treatment in model organisms" and human tissues'**>. 
Cellular transcriptional analysis may be particularly useful for investi- 
gating genetic influences on statin efficacy, as statin-induced plasma 
LDL lowering is controlled through sterol-response element binding 
protein (SREBP)-mediated transcriptional regulation’®. Therefore, to 
identify novel regulatory variants that interact with statin exposure, we 
conducted a genome-wide eQTL analysis based on comparing simvas- 
tatin exposure versus control exposure of 480 lymphoblastoid cell lines 
(LCLs) derived from European American participants in the Cholesterol 
and Pharmacogenetics (CAP) trial (http://www.clinicaltrials.gov/ct2/ 
show/NCT00451828). LCLs have proven to be a useful model system 
for the study of genetic regulation of gene expression’”*. Although non- 
genetic sources of variation, if uncontrolled, may limit the utility of LCLs 
for transcriptional perturbation analyses'””’, there has been increasing 
use of these cells to screen for genetic variants associated with molecular 
response to drug intervention”’. Furthermore, many features of statin- 
mediated regulation of cholesterol metabolism are operative in LCLs”. 


Simvastatin exposure had a significant effect on gene expression 
levels for 5,509 of 10,195 expressed genes (54%, false discovery rate 
(FDR) < 0.0001). The magnitude of change in expression across all 
responsive genes was small (0.12 + 0.08 mean absolute log, change = s.d., 
Fig. 1) with 1,952 genes exhibiting = 10% change in expression and only 
21 genes exhibiting =50% change in expression. Among the strongest 
responders were 3-hydroxy-3-methylglutaryl-CoA reductase (HMGCR), 
which encodes the direct target of simvastatin inhibition (0.49 + 0.29 
mean log, change + s.d., P< 0.0001, n = 480), and low density lipo- 
protein receptor (LDLR), which encodes the receptor responsible for 
internalization of LDL particles (0.50 + 0.35 mean log, change + s.d., 
P<0.0001). As expected, surface expression of the LDLR protein was 
also increased following simvastatin exposure (1.6 + 0.11 mean log, 
change + s.d., P< 0.0001, n = 474). Gene-set enrichment analysis showed 
a treatment-dependent increase in expression of genes involved in 
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Figure 1 | Simvastatin treatment alters transcript expression in LCLs. Log 
change in expression following simvastatin- and control-exposed 
lymphoblastoid cell lines (n = 480) displayed as a function of the log sum of 
expression traits. Grey, genes for which expression was significantly changed in 
response to simvastatin exposure (n = 5,509 genes, 0.12 + 0.08 mean absolute 
log, change + s.d., q < 0.0001); black: genes for which expression was not 
significantly changed (n = 4,686); red, genes in the cholesterol biosynthesis 
pathway, all of which exhibited significant changes in expression. 

A.U,, arbitrary units. 
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steroid biosynthesis, consistent with the mechanism responsible for 
the lipid-lowering response to statin, and a decrease in expression of 
genes involved in RNA splicing, consistent with evidence for statin 
regulation of alternative splicing of genes involved in cellular choles- 
terol homeostasis” (Supplementary Fig. 1). 

We first identified eQTLs without considering whether they interact 
with simvastatin exposure. We computed Bayes factors” to quantify 
evidence for the association between every single-nucleotide polymor- 
phism (SNP) and the expression level of each gene, and we used permu- 
tations to estimate FDRs (see Methods). This analysis identified 4,590 
genes with cis-eQTLs, defined as eQTLs within 1 megabase (Mb) 
of the gene’s transcription start or end site (FDR = 1%, logiy Bayes 
factor = 3.24, Supplementary Table 1). Statistical power to detect eQTLs 
was substantially increased by controlling for known covariates and 
unknown confounders (represented by principal components of the 
gene expression data’*”’) and by testing for association with expression 
traits averaged across paired simvastatin- and control-exposed samples 
to reduce measurement error (Supplementary Table 2 and Supplement- 
ary Fig. 2). Our analysis also identified 98 trans-eQTLs at the same strin- 
gent FDR (FDR = 1%, log Bayes factor = 7.20, Supplementary Table 3). 

To identify eQTLs that interact with simvastatin exposure (that is, 
eQTLs with different effects in control- versus simvastatin-exposed 
samples, or differential eQTLs), we used two approaches": first, univa- 
riate association mapping of log fold expression change between paired 
control- and simvastatin-exposed samples; and second, bivariate asso- 
ciation mapping of paired control- and simvastatin-exposed samples. 
This bivariate approach aims to improve power and interpretability by 
explicitly distinguishing among different modes of interaction (see 


Methods), which the univariate approach does not distinguish. The univa- 
riate approach identified differential cis-eQTLs for four genes: GATM, 
RSRC1, VPS37D and OR11L1 (FDR = 20%, logio Bayes factor = 4.9, 
Supplementary Tables 4 and 5). No differential trans-eQTLs were identi- 
fied at an FDR of 20%, so trans analyses were not pursued further (see 
Supplementary Table 6 for top differential trans-eQTLs). The bivariate 
approach identified differential cis-eQTLs for six genes (FDR = 20%, 
logio Bayes factor = 5.1; Supplementary Tables 4 and 7, Supplemen- 
tary Fig. 3 and Supplementary Data), including two genes that were not 
identified in the univariate analysis: ATP5SL and ITFG2. Both GATM and 
VPS37D had significantly stronger eQTL associations under simvastatin- 
exposed conditions in comparison to control, whereas the other four genes 
had significantly stronger eQTL associations under control-exposed 
conditions (Fig. 2a, Supplementary Table 4 and Supplementary Fig. 3). 
As in similar studies’***"’, we found many fewer differential eQTLs 
than stable eQTLs, or SNPs with similar effects across both conditions. 
The finding of relatively few gene by exposure interactions, and of 
relatively modest effect sizes of those interactions, seems remarkably 
consistent across studies regardless of method (including family-based 
comparisons), exposure, sample size, sample source, or the number of 
stable eQTLs detected. We focus further analysis on our most signifi- 
cant differential association from the bivariate model, the GATM locus, 
for which we observed stronger evidence for eQTL association after 
statin exposure and for which there was evidence for biological rele- 
vance to pathways involved in lipoprotein metabolism and myopathy 
(see Supplementary data). 

GATM encodes glycine amidinotransferase, an enzyme that is required 
for the synthesis of creatine. We observed evidence for differential eQTL 
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Figure 2 | Treatment-specific QTL associated with GATM expression. 

a, Association of rs9806699 with quantile normalized GATM expression levels 
following control exposure (left panel, not significant); simvastatin exposure 
(middle left panel, log;, Bayes factor (BF) = 5.1, effect size = —0.43); fold 
change (middle right panel, logio BF = 5.7, effect size = —0.40); control versus 
simvastatin-exposed GATM expression (right panel; black, GG, n = 225; red, 
GA, n = 207; green, AA, n = 48). Box height and whiskers are described in 
Supplementary Methods. b, Top panel, SNPs associated with GATM 
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expression (log) BF, left y axis); SNPs associated with statin-induced myopathy 
(red); significance threshold (dotted line) recombination rates in centimorgans 
(cM) per megabase (Mb) (blue, right y axis). Bottom panel, transcribed genes 
(green); DNase I hypersensitive (DHS) sites and transcription factor binding 
sites (TFBS; black); predicted chromosomal enhancers (orange) and promoters 
(red) as identified in hepatocyte (HepG2), lymphoblastoid (GM12878), and 
myocyte (HSMM) cell lines, using ChromHMM software (see Methods). 
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Table 1 | Associations of SNPs at the GATM locus with statin-induced myopathy. 
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Cases (n) — Controls (n) SNP Position LD (r?) MAF (cases) MAF (controls) Effect size P value 

Marshfield 72 220 rs9806699 Chr 15: 43,527,684 1.0 0.21 0.30 0.61 (0.39-0.95) 3.2 x10? 
72 220 rs1719247 Chr 15: 43,408,027 0.76 0.19 0.29 0.59 (0.36-0.93) 24x10? 

72 220 rs 1346268 Chr 15: 43,460,321 0.80 0.21 0.29 0.66 (0.41-1.02) 64x10? 

SEARCH 100 4,021 rs1719247 Chr 15: 43,408,027 0.70 0.17 O25: 0.61 (0.42-0.88) 1.0 x 10° 
100 4,029 rs 1346268 Chr 15: 43,460,321 0.74 0.18 0.26 0.62 (0.43-0.90) 1.0 x 10°? 

Meta-analysis rs1719247 Chr 15: 43,408,027 0.18 0.25 0.60 (0.45-0.81) 7.0 x10~* 
rs 1346268 Chr 15: 43,460,321 0.19 0.26 0.63 (0.48-0.84) 1.8 x 103 


Differential eQTL associations with GATM expression in CAP were: logio Bayes factor = 6.22 (rs9806699), logio Bayes factor = 4.35 (rs1719247), and logio Bayes factor = 5.96 (rs1346268). All SNPs were in 
Hardy-Weinberg equilibrium in these populations. Effect size reported as odds ratio with 95% confidence interval in parentheses. Chr, chromosome; LD, linkage disequilibrium with respect to the top differential 


eQTL SNP, rs9806699 based on Pearson correlation (r°). 


association with GATM (log;9 Bayes factor > 5.1) across a group of 
51 SNPs within the GATM locus that are in linkage disequilibrium 
(chromosome 15: 45627979-45740392, hg19, r = 0.85-0.99,n = 587). 
The most significant differential eQTL association was observed with 
SNP r1s9806699 (minor allele frequency (MAF) = 0.32), for which we 
observed stronger evidence for an association with GATM expres- 
sion following simvastatin exposure (log) Bayes factor = 5.1, effect 
size = —0.43) than following control exposure (log)o Bayes factor = 0.52, 
effect size = —0.17, Fig. 2a). SNPs at this locus also had a stable asso- 
ciation with expression ofa neighbouring gene, SPA TA5L1 (differential 
eQTL r1s9806699 logiy Bayes factor = —0.33, stable eQTL rs9806699 
logo Bayes factor = 21.75, Supplementary Fig. 4). This locus has been 
shown previously to be associated with reduced glomerular filtration 
rate (GFR)** with a small effect size (<1%). This association was specific 
to GFRas estimated from plasma creatinine but not from cystatin C, a 
second biomarker of renal function, suggesting that the association 
was related to variation in creatinine production rather than renal 
elimination. We found evidence for SNP differential association with 
GATM that spans the GATM coding region and includes multiple 
SNPs located within DNase I hypersensitive sites, active promoters 
and several alternative GATM transcription start sites (Fig. 2b). 
Phosphorylation of creatine, the primary downstream product of 
GATM activity, is a major mechanism for energy storage in muscle 
and is mediated by creatine kinase, the primary plasma biomarker of 
statin-induced myopathy. To test the relationship of this locus with 
statin-induced myopathy, we examined the association of the GATM 
differential eQTL locus with statin-induced myopathy in a population- 
based cohort comprised of 72 cases of myopathy and 220 matched 
controls (Marshfield cohort)’’. In this cohort, we observed that the 
minor allele at the GATM differential eQTL locus was associated with 
reduced incidence of statin-induced myopathy (odds ratio = 0.61, 95% 
confidence interval = 0.39-0.95, P = 0.03; Table 1). This association 
was replicated in a second cohort consisting of 100 cases of myopathy 
identified within the Study of Effectiveness of Additional Reductions in 
Cholesterol and Homocysteine (SEARCH; http://clinicaltrials.gov/ct2/ 
show/NCT00124072)'° (odds ratio for rs1719247 = 0.61, confidence 
interval = 0.42-0.88, P = 0.01; 17 = 0.70 to rs9806699; Table 1). Meta- 
analysis of these two cohorts showed an overall odds ratio of 0.60 
(confidence interval = 0.45-0.81, P = 6 X 10-4, logio Bayes factor = 1.5, 
Table 1). As myopathy is defined in part through elevation in plasma 
creatine kinase concentrations, we also tested for a direct association of 
this locus with this enzyme in statin-treated populations in which 
myopathy was not observed. Within CAP (40 mg per day simvastatin 
exposure for 6 weeks), no association of rs9806699 was observed with 
plasma creatine kinase either before simvastatin exposure (n = 575, 
P= 0.83) or following exposure (n = 574, P = 0.48). This lack of asso- 
ciation was confirmed in a second statin study (Justification for the Use 
of Statins in Prevention: an Intervention Trial Evaluating Rosuvastatin 
(JUPITER) trial, 20 mg per day rosuvastatin, median follow-up = 1.9 years; 
http://clinicaltrials.gov/show/NCT00239681) both before rosuvastatin 
exposure (1 = 8,504, P = 0.54) and after treatment (n = 3,052, P = 0.83)’. 
These findings suggest that the observed association of the GATM locus 
with risk for statin-induced myopathy is independent of an association 
with plasma creatine kinase. Although the present studies do not address 


the mechanism for the link between reduced GATM expression and 
protection from statin-induced myopathy, it is thought that diminished 
capacity for phosphocreatine storage modifies cellular energy storage 
and adenosine monophosphate-activated protein kinase (AMPK) 
signalling”? in a manner that is protective against cellular stress as 
induced by glucose deprivation” or, potentially, by cholesterol deple- 
tion. Given that myocellular creatine stores are predominantly derived 
from renal and hepatic creatine biosynthesis, these results raise the 
possibility that statins may predispose to muscle toxicity in part through 
metabolic effects in the liver, the major site of statin’s pharmacologic 
actions (Supplementary Fig. 5). Conversely, the finding of severe myo- 
pathy in two cases of extreme genetic GATM deficiency” suggests that 
this protective effect may be overcome if creatine synthesis is insuffi- 
cient to support myocellular energy needs. 

Given the influence of statin exposure on regulation of GATM expres- 
sion, we next tested whether GATM may modulate sterol-mediated 
changes in cholesterol homeostasis. Knockdown of GATM in hepatocyte- 
derived cell lines (HepG2 and Huh7) resulted in reduced upregulation 
of SREBP-responsive genes (HMGCR, LDLR and SREBP2) by sterol 
depletion (Fig. 3a). Moreover, GATM knockdown decreased media 
accumulation of apolipoprotein B (apoB), the major structural protein 
of LDL, in both cell lines (P < 0.05; Fig. 3b), but did not alter levels of 
apoAI, the major structural protein in high density lipoproteins (HDLs, 
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Figure 3 | GATM knockdown attenuated sterol-mediated induction of 
expression of SREBP-responsive genes. a, Changes in transcript 
concentrations following sterol depletion through 24h exposure to lipoprotein 
deficient serum (LPDS)-containing media versus standard FBS-containing 
media in hepatocyte-derived HepG2 (left, n = 12) and Huh7 (right, n = 12) cell 
lines. Asterisk indicates P < 0.05 for the comparison of GATM versus 
non-targeting control (NTC) siRNA treated cells. b, Fold changes in 
accumulation of apolipoprotein B (ApoB) and apolipoprotein AI (ApoAI) in 
media after gene knockdown with GATM versus (NTC) siRNA in HepG2? cells 
(left, n = 6-10) or Huh7 cells (right, n = 4-6) under standard culture 
conditions. Experiments repeated 2-3 times with 2-8 biological replicates each. 
Data presented as average values. Error bars, s.e.m. 
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Fig. 3b). An effect of GATM deficiency on cholesterol and lipoprotein 
metabolism is supported further by a recent study describing reduced 
plasma cholesterol concentrations in GATM knockout mice”. 

In summary, this study has provided evidence that functionally signi- 
ficant genetic effects can be discovered using a novel cell-based screen 
for gene-by-treatment effects on transcriptional expression. This approach 
has led to the identification of GATM as a genetic locus associated with 
statin-induced myopathy, and as a potential link between cellular cho- 
lesterol homeostasis and energy metabolism. 


METHODS SUMMARY 


Gene expression levels were measured using the Illumina Human-Ref8v3 beadar- 
ray in 480 lymphoblastoid cell lines derived from European American participants 
in CAP, a 6-week trial of simvastatin (40 mg per day), after 24h exposure to 2 1M 
activated simvastatin or control buffer. Treatment-specific effects were modelled 
after adjustment for known covariates and unknown confounding variables using 
linear regression, and eQTLs were identified using the BIMBAM software after 
imputing the available genotypes. Differential eQTLs were identified using the 
BIMBAM software (univariate test) as well as linear models of differential associa- 
tion (bivariate tests). Associations with myopathy were tested in two cohorts contai- 
ning cases of statin-induced myopathy (definitions of myopathy in Marshfield and 
SEARCH are described in the Supplementary Methods), and associations with 
plasma creatine kinase were tested in two statin trials that did not contain myo- 
pathy cases (CAP and JUPITER). Media accumulation of apolipoproteins was 
measured by enzyme-linked immunosorbent assay (ELISA) and gene expression 
was measured by quantitative PCR in hepatoma cell lines (HepG2 and Huh7) 
after GATM knockdown as achieved by 48 h transfection of Ambion Silence Select 
short interfering RNA (siRNA) or non-targeting control. See full Methods for 
complete details. 


Full Methods and any associated references are available in the online version of 
the paper. 
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In vitro simvastatin exposure of lymphoblastoid cell lines. Lymphoblastoid cell 
lines (LCLs), immortalized by Epstein-Barr virus transformation of lymphocytes 
isolated from whole blood*', were derived from European American participants in 
the CAP trial, a 6-week simvastatin trial of 40 mg per day (Supplementary Table 8)’. 
Simvastatin was provided by Merck, converted to active form (beta-hydroxy simvas- 
tatin acid, SVA) and quantified by liquid chromatography-tandem mass spectro- 
metry as described previously~’. LCLs were normalized to a uniform cell density and 
exposed to 2 1M SVA (simvastatin-exposed) or control buffer (control-exposed) 
for 24h as described previously”. This concentration was selected by assessing 
dose-response effects on expression profiles (n = 8 LCLs, 4 doses), wherein a more 
robust change in expression profiles was observed with 2 1M simvastatin exposure 
(7.8% of genes, q = 0.001) than lower doses (<0.1% of genes for 0.02 LM or 0.2 uM, 
q = 0.001, data not shown). Pre-experiment cell density was recorded as a surrog- 
ate for cell growth rate. After exposure, cells were lysed in RNAlater (Ambion), and 
RNA vas isolated using the Qiagen miniprep RNA isolation kit with column DNase 
treatment. 

Expression profiling and differential expression analysis. RNA quality and 
quantity were assessed using a Nanodrop ND-1000 spectrophotometer and an 
Agilent Bioanalyzer, respectively. Paired RNA samples, selected based on RNA 
quality and quantity, were amplified and labelled with biotin using the Illumina 
TotalPrep-96 RNA amplification kit, hybridized to lumina HumanRef-8v3 bea- 
darrays (Illumina), and scanned using an Illumina BeadXpress reader. Data were 
read into GenomeStudio and samples were selected for inclusion based on quality- 
control criteria: signal to noise ratio (95th:5th percentiles); matched gender between 
sample and data; and average correlation of expression profiles within three stan- 
dard deviations of the within-group mean (r = 0.99 + 0.0093 for control-exposed 
andr = 0.98 + 0.0071 for simvastatin-exposed beadarrays). In total, viable expres- 
sion data were obtained from 1,040 beadarrays including 480 sets of paired samples 
for 10,195 genes. Genes were annotated through biomaRt from ensMBL Build 54 
(http://may2009.archive.ensemble.org/biomart/martview). Treatment-specific effects 
were modelled from the data following adjustment for known covariates using 
linear regression”. False discovery rates were calculated for differentially expressed 
transcripts using the qvalue package**. Ontological enrichment in differentially 
expressed gene sets was measured using GSEA (1,000 permutations by phenotype) 
using gene sets representing Gene Ontology biological processes as described in the 
Molecular Signatures v3.0 C5 Database (10-500 genes per set)”. 

Expression QTL mapping. For association mapping, we use a Bayesian approach” 
implemented using the software package BIMBAM™ that is robust to poor imputa- 
tion and small minor-allele frequencies*®. Gene expression data were normalized 
as described in the Supplementary Methods for the control-treated (C480) and 
simvastatin-treated (T480) data and used to compute D480 = T480 — C480 and 
S480 = T480 + C480, where T480 represents the adjusted simvastatin-treated data 
and C480 represents the adjusted control-treated data. SNPs were imputed as 
described in the Supplementary Methods. To identify eQTLs and differential eQTLs, 
we measured the strength of association between each SNP and gene in each 
analysis (control-treated, simvastatin-treated, averaged, and difference) using 
BIMBAM with default parameters*. BIMBAM computes the Bayes factor for an 
additive or dominant response in expression data as compared with the null, which 
is that there is no correlation between that gene and that SNP. BIMBAM averages 
the Bayes factor over four plausible prior distributions on the effect sizes of additive 
and dominant models. We used a permutation analysis (see Supplementary Methods) 
to determine cutoffs for eQTLs in the averaged analysis (S480) at an FDR of 1% for 
cis-eQTLs (logo Bayes factor > 3.24) and trans-eQTLs (logo Bayes factor > 7.20). 
For cis-eQTLs, we considered the largest log; Bayes factor above the cis-cutoff for 
any SNP within 1 Mb of the transcription start site or the transcription end site of 
the gene under consideration. For trans-eQTLs, we considered the largest logio Bayes 
factor above the trans-cutoff for any SNP, and if that SNP was in the cis-neighbourhood 
of the gene being tested, we ignored any potential trans-associations; there were 
6,130 genes for which the SNP with the largest logo Bayes factor was not in cis with 
the associated gene. Correspondingly, we only considered those 6,130 genes when 
computing the permutation-based FDR for the trans-associations. 

Differential expression QTL mapping. We define cis-SNPs as being within 1 Mb 
of the transcription start site or end site of that gene. To identify differential QTLs, 
we first computed associations between all SNPs and the log fold change using 
BIMBAM as above. 

We then considered a larger set of models for differential eQTLs. The associa- 
tions for the genes in Supplementary Fig. 3 indicate that there are a few possible 
patterns of differential association. Although these patterns may have different 
mechanistic or phenotypic interpretations, they are not distinguished by a test of 
log fold change. We used the interaction models introduced in another paper" to 
compute the statistical support (assessed with Bayes factors) for the four alternative 
eQTL models described above versus the null model (no association with genotype). 
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These methods are based on a bivariate normal model for the treated data (T) and 
control-treated data (U). Note that simply quantile transforming T and U to a 
standard normal distribution is not sufficient to ensure that they are jointly biva- 
riate normal, and so we used the following more extensive normalization proced- 
ure. Let D = qT — qUand S = qT + qU, where q indicates that the vector following 
it has been quantile normalized. We then quantile normalize and scale D and S to 
produce S = (asqS) and D = (apqD), where os, op are robust estimates of the stan- 
dard deviations of S and D, respectively (specifically, they are the median absolute 
deviation multiplied by 1.4826). Note that this transformation ensures that S and D 
are univariate normal. Furthermore, they are approximately independent, which 
ensures that they are also bivariate normal. Finally, let U=0.5(S - D) and 
T=0.5(S + D). 

The Bayes factor when the eQTL effect is identical in the two conditions (model 1) 
uses the linear model L(S ~ D + g), where g is the vector of genotypes at a single 
SNP. The Bayes factor when the eQTL is only present in the control-treated 
samples (model 2) uses the model L(U~ T+ g). The Bayes factor when the 
eQTL is only present in the simvastatin-treated samples (model 3) uses the model 
L(T ~ U+ g). The Bayes factor when the eQTL effect is in the same direction but 
unequal in strength (model 4) uses the model L(D ~ S + g). We averaged each 
Bayes factor for each gene and each cis-SNP over four plausible effect size priors 
(0.05, 0.1, 0.2 and 0.4). 

To find eQTLs that interact with treatment (that is, those that conform best 
to one of the differential models 2-4, rather than the null model or the stable 
model) we defined an interaction Bayes factor (IBF) = 2(Bayes factor, + Bayes 
factor; + Bayes factor,)/3(Bayes factor, + 1), where Bayes factor; denotes the Bayes 
factor for model i compared with the null model (the 1 in the denominator repre- 
sents the null model Bayes factorg). Large values of the IBF represent strong support 
for at least one interaction model (2-4) compared with the two non-interacting 
models (0-1), and hence strong support for a differential association. 
Association with statin-induced myopathy. For the Marshfield Cohort”, cases 
of myopathy were identified from electronic medical records of patients treated at 
the Marshfield Clinic (Wisconsin, USA) using a combination of automated natural 
language processing and manual review as described previously’. Seventy-two 
cases of incipient myopathy (creatine kinase concentrations greater than 3-fold 
normal concentrations, with evidence in the charts of muscle complaints) were 
identified for which patients were not also undergoing treatment with concomitant 
drugs known to increase incidence of statin-induced myopathy (fibrates or niacin). 
Controls were matched based on statin exposure, age and gender. This study was 
approved by the Marshfield Clinic institutional review board. The study popu- 
lation included residents living in Central and Northern Wisconsin, served by the 
Marshfield Clinic, a large multi-specialty group practice*”. For the SEARCH and 
Heart Protection Study Collaborative Groups'***, a total of 100 myopathy cases 
were identified from participants with genotyping data in the SEARCH trial, 
including 39 definite myopathy cases (creatine kinase > 10 X upper limit of nor- 
mal (ULN) with muscle symptoms) and 61 incipient myopathy cases (defined as 
creatine kinase = 5 times baseline value and alanine transaminase =1.7 times 
baseline value and creatine kinase >3 X ULN). Genotypes were available from 
the Illumina Human610-Quad Beadchip for 25 myopathy cases (12% of which had 
definite myopathy) and from the Illumina HumanHap300-Duo BeadChip for 75 
myopathy cases (48% of which had definite myopathy). Genotypes for rs9806699 
were only available for the 25 cases genotyped on the Illumina Human610-Quad 
Beadchip, so proxy SNPs were used for analyses in Table 1. Analyses of rs9806699 
are provided in Supplementary Table 9. All myopathy cases were compliant with 
statin therapy (95 myopathy cases occurred while the patient was taking simvas- 
tatin 80 mg daily, and 5 cases while taking simvastatin 20 mg daily). Controls were 
identified from the SEARCH Study as well as from the Heart Protection Study 
(where considerably more participants had been genotyped). Controls from the 
Heart Protection Study had similar baseline characteristics to those in the 
SEARCH Study and inclusion of this large number of additional controls improved 
statistical power. Multi-centre ethics approval was obtained from the South East 
Research Ethics Committee for the SEARCH study, and from the local ethics com- 
mittees covering each of the 69 UK hospitals involved in the Heart Protection Study. 
Genetic associations were determined by chi-squared analysis using an additive 
model. A meta-analysis was performed using a random effects model and, for the 
Bayesian analysis, we used an expected effect size of 0.2. Associations of rs9806699 
with plasma creatine kinase in the CAP* and JUPITER’ trials were also assessed 
using linear regression. The CAP trial (ClinicalTrials.gov number NCT00451828) 
was approved by the institutional review boards located at Children’s Hospital 
Oakland Research Institute (Oakland, California) and all enrollment sites. The 
JUPITER trial (ClinicalTrials.gov number NCT00239681) was approved by the 
Institutional Review Board of Brigham and Women’s Hospital. Informed consent 
was obtained from all participants in all trials. 
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Genomic characterization of GATM locus. Cis-regulatory elements were down- 
loaded from the ChromHMM (ref. 39) track of the UCSC Genome Browser 
(ref. 40) and aggregated manually. 

Functional analysis of candidate genes. GATM knockdown was achieved by 48 h 
transfection of Ambion Silence Select siRNA or non-targeting control into 80,000 
HepG2 or Huh7 cells per well in 12-well plates. To assess the influence of sterol 
depletion, cell culture medium was replaced with medium containing 10% lipo- 
protein deficient serum (Hyclone) or fetal bovine serum (Omega Scientific) at 24h 
after transfection. All samples were harvested 48h post transfection. Transcript 
levels were quantified by quantitative PCR and normalized to CLPTM. Cell culture 
medium was taken from all samples at the time of collection, and ApoB (MP 
Biomedicals) and ApoAI (Meridian Life Sciences) were quantified in triplicate 
by sandwich-style ELISA. Samples with a coefficient of variation greater than 15% 
were subjected to repeat measurement. 
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HELQ promotes RAD51 paralogue-dependent repair 
to avert germ cell loss and tumorigenesis 


Carrie A. Adelman’, Rafal L. Lolo!, Nicolai J. Birkbak?, Olga Murina’, Kenichiro Matsuzaki!, Zuzana Horejsi’, Kalindi Parmar’, 
Valérie Borel!, J. Mark Skehel>+, Gordon Stamp*, Alan D’ Andrea‘, Alessandro A. Sartori®, Charles Swanton’’* & Simon J. Boulton! 


Repair of interstrand crosslinks (ICLs) requires the coordinated 
action of the intra-S-phase checkpoint and the Fanconi anaemia 
pathway, which promote ICL incision, translesion synthesis and 
homologous recombination (reviewed in refs 1, 2). Previous studies 
have implicated the 3’-5’ superfamily 2 helicase HELQ in ICL repair 
in Drosophila melanogaster (MUS301 (ref. 3)) and Caenorhabditis 
elegans (HELQ-1 (ref. 4)). Although in vitro analysis suggests that 
HELQ preferentially unwinds synthetic replication fork substrates 
with 3’ single-stranded DNA overhangs and also disrupts protein- 
DNA interactions while translocating along DNA”, little is known 
regarding its functions in mammalian organisms. Here we report 
that HELQ helicase-deficient mice exhibit subfertility, germ cell attri- 
tion, ICL sensitivity and tumour predisposition, with Helq hetero- 
zygous mice exhibiting a similar, albeit less severe, phenotype than 
the null, indicative of haploinsufficiency. We establish that HELQ 
interacts directly with the RAD51 paralogue complex BCDX2 and 
functions in parallel to the Fanconi anaemia pathway to promote 
efficient homologous recombination at damaged replication forks. 
Thus, our results reveal a critical role for HELQ in replication- 
coupled DNA repair, germ cell maintenance and tumour suppres- 
sion in mammals. 

To examine the effect of HELQ deficiency in vertebrates, we generated 
a Helq*©-deficient mouse strain that is truncated at the carboxy terminus 
of HELQ (Fig. 1a, b and Extended Data Fig. 1a, b) and results in loss of 
detectable HELQ protein (Fig. 1d and Extended Data Fig. 1c). Although 
Helq*“'*© mice are viable (Fig. 1c), are born in normal Mendelian 
ratios and lack growth or developmental abnormalities (Extended 
Data Fig. 1d, e), breeding experiments with Helq“@/“© mutant pairs 
revealed a fertility defect. Eight heterozygous and 8 homozygous pairs 
were mated continuously for 5-6 months, resulting in 320 offspring 
in the case of heterozygotes (an average of 6.1 litters and 40 pups each) 
but only 38 pups in the case of homozygotes (1.4 litters and 4.7 pups 
per pair). Mating of mutants to control animals revealed that females 
contribute more to this phenotype than males (Fig. le). 

Consistent with a fertility defect, Helq*@/"“ testes were smaller than 
those of wild-type males (0.58% of body weight for wild type versus 
0.38% for mutants (Fig. 1f)). Histological analysis of testes revealed many 
normal tubules but also regions of atrophy in the mutants (Fig. 1g and 
Extended Data Fig. 1g—l). Dysgenesis/atrophy was even more pronounced 
in Helq*“'*© ovaries (Fig. 1g and Extended Data Fig. 1f). A possible 
stem cell origin was investigated as no particular subset of spermato- 
cytes appeared affected (Extended Data Fig. 1g-l). Indeed, Helq*“/4° 
adults had significantly fewer c-KIT (also known as KIT)* spermato- 
gonia than controls (Extended Data Fig. 2a, b). As atrophy was not 
linked to ageing (Extended Data Fig. 2c), a developmental origin was 
examined; tubules from 5-day-old wild-type mice contained sixfold 
more spermatogonia than mutants (Fig. 1h), indicating that atrophic 


tubules in mutant adults may primarily arise from reduced spermato- 
gonial stem cell pools during development. 

The effect of HELQ deficiency during organismal ageing revealed 
that tumour-free survival was significantly reduced in Helq mutants 
(Fig. liand Extended Data Fig. 2d), with twice as many Helq*“/“© mice 
developing two or more primary tumours in comparison to controls 
(Fig. 1j). Ovarian tumours (resembling granulosa and other sex cord 
stromal tumours; Extended Data Fig. 3b-f) and pituitary adenomas 
(Extended Data Fig. 3g-j) were the most prominent tumour types in 
female mice, with incidences of 40% in the case of ovarian tumours and 
30% in the case of pituitary tumours (Fig. 1k). Unexpectedly, Helq*/“© 
heterozygous females also presented with ovarian pathology similar to 
that of younger mutant females (Extended Data Fig. 2d). Pathology 
included cystic (4 out of 7 mice) and dysgenic/atrophic (5 out of 7) 
ovaries with few or no maturing follicles (7 out of 7) and luteinized 
stroma (2 out of 7). Helq*'“© heterozygous females also frequently 
displayed pituitary (5 out of 7 mice), Harderian gland (3 out of 7) and 
gastrointestinal (3 out of 7) adenomas, polyps and hyperplasias. Although 
these phenotypes are less severe than observed in the Helq*““© homo- 
zygous mice, the data reveal that loss of a single allele of Helq confers 
haploinsufficiency in mice. 

The phenotype of Helq*“’*© mice is similar to that observed in mouse 
models of Fanconi anaemia’. Haematopoietic stem and progenitor cell 
(HSPC) defects and sensitivity to ICLs are also hallmarks of Fanconi 
anaemia and were therefore examined in Helq mutants. Although bone 
marrow HSPCs from Helq*““© mice exhibit hypersensitivity to the 
ICL agent mitomycin C (MMC; Extended Data Fig. 4a), Helq*©/4° 
HSPCs were not compromised in numbers (Extended Data Fig. 4b, c), 
proliferative capacity (Extended Data Fig. 4d, e) or engraftment 
(Extended Data Fig. 4f-i). HELQ-deficient cells exhibited hypersensi- 
tivity to replication blocking agents such as MMC and camptothecin 
(CPT; Fig. 2a, b), but not to ionizing radiation or ultraviolet radiation 
(Fig. 2c, d). Helq*“"“ cells also exhibited significantly more chromatid 
breaks and radial chromosomes than control cells upon treatment with 
mitomycin C (MMC; Fig. 2e, j). Silencing of HELQ by short interfering 
RNA (siRNA) in human cells resulted in similar phenotypes (Extended 
Data Fig. 4j). 

To examine the phenotypic relationship between HELQ and the 
Fanconi anaemia pathway we generated Helq*“'“° Fancd2 ‘~ double- 
mutant mice (the Fancd2 strain is described in ref. 8). Double mutants 
were born in Mendelian ratios (Extended Data Fig. 4k) and growth and 
appearance were normal. Surprisingly, testes from double mutants were 
significantly smaller than single mutants and all tubules were atrophic, 
containing only Sertoli cells (Fig. 2f, g). Similarly, double-mutant cells 
exhibited greater sensitivity to MMC and CPT than either single- 
mutant (Fig. 2h, i) and spontaneous and MMC-induced chromosomal 
aberrations were significantly increased over the Helq single mutant 
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Figure 1 | A mouse model of HELQ deficiency. a, Helq genomic locus. Base 
pairs are indicated above. Introns are not to scale; exons are roughly to scale 
with black bar indicating 1 kilobase. Location of the /-geo gene trap and 
genotyping primers are shown. b, HELQ domain architecture: amino acids are 
indicated, red bar spans epitope recognized by the HELQ antibody used for 
western blotting. c, Helq genotype PCR. d, Lysates from ear fibroblasts of Helq 
mice probed for HELQ. e, The number of litters (black) and pups (pink) 
generated per 21-day gestational interval. f, Helq testis images and weights. 

g, Histological sections of Helq gonads. ‘A’ denotes atrophic tubules; asterisks 
denote developing ovarian follicles. h, Left, 5-day-old neonatal testes labelled 
with the Sertoli marker WT1 (brown) and haematoxylin (blue), to reveal 
spermatogonia (asterisks). Right, quantification of spermatogonia (SG). 

i, Epithelial and stromal tumour-free survival of Helq mice. j, k, Frequency of 
mice with two or more primary tumours (j) and female-specific pathology (k). 


(Fig. 2j). These results suggest that HELQ and FANCD2 act in parallel 
ICL repair pathways. 

To gain insight into the function of HELQ during DNA repair, we 
performed proteomic analysis of proteins co-precipitated with Flag- 
tagged HELQ. Mass spectrometry revealed several checkpoint and DNA 
repair proteins, including the replication checkpoint kinase ATR, the 
single-stranded DNA binding protein RPA70 (also known as RPA1), 
the four components of the BCDX2 complex required for homologous 
recombination, and the FANCD2-FANCTI heterodimer that functions 
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Figure 2 | Helq damage sensitivity and Helq Fancd2 double-mutant 
phenotypes. a—d, Clonogenic survival assays of immortalized cells exposed to 
the indicated damaging agents. e, Metaphase chromosomes (metas) from 
immortalized cells treated with 5 ngml_* MMC for 16h; frequency of 
metaphases with radials + MMC are indicated. f, Testis weights of wild-type, 
Helg single- and Helq Fancd2 double-mutant mice. g, Histological sections of 
testes. A, atrophic; S, Sertoli cell only. h, i, Clonogenic survival assays as in 

a. j, Metaphase aberrations from immortalized cells treated with or without 
10ngml_' MMC for 16h. 


in the Fanconi anaemia pathway (Fig. 3a, b and Extended Data Fig. 5a). 
Interaction partners identified by mass spectrometry were confirmed 
via immunoprecipitation/western blot (Fig. 3c). Reciprocal immuno- 
precipitation of RAD51C confirmed its association with Flag-HELQ 
(Fig. 3d), and endogenous HELQ was detected in RAD51C immuno- 
precipitates from 293T cells and vice versa (Fig. 3e). Recombinant BCDX2 
proteins purified from either insect cells or bacteria also bound to puri- 
fied Flag-HELQ but not to ALC1-Flag or Flag controls (Fig. 3f). Intri- 
guingly, XRCC3, a component of the RAD51C-XRCC3 (CX3) RAD51 
paralogue complex, was not detected by either mass spectrometry or 
western blotting of Flag-HELQ immunoprecipitates (Fig. 3c). Further- 
more, HELQ was not found in reciprocal immunoprecipitates with 
endogenous XRCC3 (Extended Data Fig. 5c). These data indicate that 
HELQ interacts directly with the BCDX2 complex but not with the 
CX3 complex. 

As ATR, RPA70, the BCDX2 complex and FANCD2-FANCI all 
respond to stalled replication forks, we examined the localization of 
green fluorescent protein-tagged HELQ (HELQ-GFP) expressed in 
NIH 3T3 cells’. Subcellular fractionation revealed that HELQ-GFP 
is enriched on chromatin in response to replication fork stalling with 
either MMC or aphidicolin, and this is compromised by ATR inhibi- 
tion (Fig. 3g and Extended Data Fig. 5e-g). 

To determine the possible underlying cause of the defect in HELQ- 
deficient cells we examined replication dynamics, indices of checkpoint 
activation, Fanconi anaemia pathway activation, double-stranded break 
(DSB) formation, and the integrity of homologous recombination. 
Replication fork extension rates in Helq*“'“© cells were significantly 
lower than in wild-type cells (Extended Data Fig. 6d, e) and this was 
exacerbated by treatment with CPT (Extended Data Fig. 6d). Replication 
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Figure 3 | HELQ interacts with DNA replication stress response factors. 

a, Unique and total peptides identified by mass spectroscopy analysis of 
HELQ-Flag co-immunoprecipitates isolated from 293 cells. b, HELQ 
interaction network based on the mass spectroscopy results (coloured lines) 
and reported interactions from BIOGRID, STRING and MINT databases 
(dashed lines). c, Western blots of input and Flag immunoprecipitates (IP) from 
HELQ-Flag and Flag control samples. d, Reciprocal immunoprecipitates of 
endogenous RAD51C with HELQ—Flag. e, Endogenous HELQ and RAD51C 
immunoprecipitates from 293T cells. f, Purified, His-tagged BCDX2 complexes 
(top) were incubated with Flag-complexed beads (bottom) to test for a direct 
interaction. Flag and ALC1-Flag are shown as negative controls. NT, non- 
transfected. g, Western blot analysis of whole cells extracts (WCE) and 
chromatin fractions (Chr) from NIH3T3 HELQ-GFP-expressing cells 
treated with or without 100 ng ml’ MMC for 24h. 


fork tract asymmetry was also evident in mutants relative to controls 
indicative of increased fork stalling/collapse (Extended Data Fig. 6f, g). 

Checkpoint activation as measured by phosphorylation of ATM, 
CHK1 and CHK2 (also known as CHEK1 and CHEK2, respectively) 
and yH2AX in response to DNA damage was unaffected by loss of HELQ 
in either mouse or human cells (Extended Data Fig. 7a—d). Furthermore, 
HELQ-deficient cells exhibited constitutive FANCD2 monoubiquitin- 
ation, indicating that HELQ is dispensable for this modification 
(Fig. 4a). Assessment of RADS51 recruitment to damaged replication forks 
revealed that RADS51 is enriched on chromatin in response to MMC 
treatment in HELQ-deficient mouse and human cells (Fig. 4b and Extended 
Data Fig. 7e, 8a) and RADS1, RPA (also known as RPA1) and yYH2AX 
accumulate in repair foci (Fig. 4c and Extended Data Fig. 7e-g). However, 
RADS1 and yH2AX persisted on chromatin and remained present in 
repair foci at later time points in HELQ-deficient mouse and human 
cells (Fig. 4b, c and Extended Data Fig. 7e, f), suggesting that the defect 
in the absence of HELQ occurs at a step downstream of RAD51 recruit- 
ment to damaged replication forks. Pulsed field gel electrophoresis 
revealed that DSBs form in Helg and Fancd2 single and double mutant 
cells after MMC treatment, but that these lesions persist at later time 
points, indicating that DSBs induced at ICLs are not efficiently repaired 
(Fig. 4d). siRNA-induced depletion of HELQ resulted in a two- to 
threefold decrease in homologous recombination efficiency, implic- 
ating HELQ in promoting homologous recombination (Fig. 4e and 
Extended Data Fig. 8b). Furthermore, clonogenic survival of HELQ- 
deficient mouse and human cells were significantly compromised in 
response to poly-ADP ribose polymerase (PARP) inhibition, which is a 
hallmark of homologous recombination-deficient cells’? (Fig. 4f and 
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Figure 4 | HELQ influences DNA repair and homologous recombination 
efficiency. a, Lysates from immortalized mouse cells, grown under 
physiological O, and treated with or without 3 1M aphidicolin (APH) for 6h, 
were probed for FANCD2. Wild-type (WT), HELQ-deficient (HQ), 
unmodified (S) and ubiquitinated (L) forms of FANCD2 were used. 

b, Chromatin fractions from immortalized mouse cells, probed for RADS51, 
histone H3 and «-tubulin at the indicated time points (hours) following 
treatment with or without 100 ng ml ! MMC. ¢, Left, representative images of 
RADS51 focus formation in immortalized mouse cells at the indicated time 
points (hours) following treatment with 1 1M MMC. Right, quantification of 
RADS51 foci at the indicated time points. d, Pulsed field gel electrophoresis of 
genomic DNA from immortalized cells treated with or without 1 1M MMC for 
1h and recovered for the indicated number of hours. un, undamaged. Wells, 
intact DNA; arrow, band containing large chromosomal fragments (10-0.45 
megabases); below the arrow, smaller fragments resolved by size. 

e, Homologous recombination frequencies in direct repeat (DR)-GFP reporter 
cells treated with the indicated siRNAs. LUC, luciferase. *P<0.05; **P<0.001. 
f, Clonogenic survival assays of immortalized mouse cells exposed to PARP 
inhibitor (PARPi). Error bars represent s.e.m. 


Extended Data Fig. 8c). It is notable that the HELQ interacting protein 
and BCDX2 complex factor, RAD51D, is also required for PARP 
inhibitor resistance’. 

In summary, our results uncover a critical role for HELQ in germ 
cell maintenance and tumour suppression in mammals, which we 
attribute to a role in replication-coupled DNA repair. The interaction 
between HELQ and the RADS51 paralogue BCDX2 complex may provide 
molecular insight into the HELQ phenotype and its role in tumori- 
genesis, as the BCDX2 complex functions to promote replication- 
coupled homologous recombination, RAD51C has been implicated 
in a Fanconi anaemia-like disorder’, and mutations in RAD51B and 
RAD51D are associated with hereditary ovarian cancer in humans''?""*. 
These findings suggest HELQ as a strong candidate for screening in 
human cancers and also explain why mutations in D. melanogaster 
homologues of HELQ, RADS51, and the two RADS1 paralogues (MUS301 
(also known as SPN-C), SPN-A, SPN-B and SPN-D, respectively) 
confer a very similar phenotype”. Finally, our findings help to explain 
the prevalence of non-synonymous variants in HELQ, which are sig- 
nificantly associated with upper aerodigestive tract cancers, particu- 
larly amongst smokers”; and variants in HELQ associated with early 
menopause’’, which may reflect the germ cell defects and ovarian 
dysgenesis observed in HELQ-deficient mice. 
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HELQ-deficient mice were generated from a commercially available embryonic 
stem cell clone (clone ID: RRF112, Bay Genomics, University of California, Davis). 
All strains were maintained on a mixed B6/129 background. Testes for histology 
were fixed in Bouin’s solution and periodic acid-Schiff/haematoxylin stained; all 
other tissues were fixed in neutral buffered formalin and stained with haematox- 
ylin and eosin. Bone marrow cells were collected and analysed by FACS, colony 
formation and transplantation as described previously*****. HELQ-Flag, Flag, and 
ALC1-"Flag cell lines were generated using the HEK293 Flp-In system according to 
the manufacturer’s protocol (Invitrogen). HELQ-GFP-expressing NIH3T3 cells 
were generated according to the bacterial artificial chromosome recombineering 
method described previously’. Pulse field gel electrophoresis was carried out sim- 
ilar to the method described in ref. 24. Isolation of HELQ—-Flag immunocomplexes 
entailed lysis of cells in the presence of benzonase to prevent non-specific DNA- 
bridging of proteins. Flag immunoprecipitates for mass spectrometry analysis 
were eluted by boiling in SDS-PAGE sample buffer and processed using standard 
methods. The Biological General Repository for Interaction Data sets (BioGRID, 
http://thebiogrid.org/), the Molecular INTeraction database (MINT, http://mint. 
bio.uniroma2.it/mint) and Search Tool for the Retrieval of Interacting Genes/ 
Proteins database (STRING, http://string-db.org/) were used to compile the inter- 
action network. For assays of in vitro binding to purified His-tagged BCDX2, Flag- 
tagged proteins were purified by washing immune-complexed beads four times 
with buffer containing 1 M NaCl. Purified Flag proteins were then incubated with 
recombinant BCDX2 complex, washed and eluted for analysis. Chromatin fractiona- 
tion was carried out using modified versions of previously established methods*”*. 
For siRNA transfections of U2OS cells, cells were subjected to two rounds of reverse 
transfections using siGENOME siRNA and Dharmafect1 (Thermofisher) accord- 
ing to the manufacturer’s protocol. Histology/immunohistochemistry, primary 
cell line derivation and immortalization, immunofluorescence, assays for clonogenic 
survival, metaphase aberrations, micronuclei and DNA combing were carried out 
using standard procedures. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Mouse strains, maintenance, localization of genetrap, genotyping. All mice 
were housed and maintained according to the Home Office guidance outlined in 
the Animals Scientific Procedures Act. All strains were maintained on a mixed B6/ 
129 background. 

A commercially available embryonic stem cell clone (clone ID: RRF112, Bay 
Genomics, University of California, Davis) harbouring the trapped allele of Helq 
was obtained. The position of the f-geo gene trap cassette from the pGTOLxF 
vector was determined via splinkerette PCR” using a modified version of the 
method described in ref. 28. In brief, 3 ug genomic DNA from Helq gene-trapped 
embryonic stem cells was digested overnight at 37 °C with Bfal. Splinkerette primers 
were annealed in SuRE Buffer M (Roche) by heating to 95 °C, followed by cooling 
by 1°C increments for 15s each. A total of 40 pmol of the annealed splinkerette 
adaptor was ligated to 600 ng of Bfal-digested genomic DNA, followed by passage 
over a PCR purification column (Qiagen). The splinkerette-adapted genomic DNA 
was re-digested with Bfal for 1h (eliminating potential background due to splin- 
kerette ligation to partially digested genomic DNA fragments) and re-purified. 
Primary and nested PCRs were performed using genetrap- and splinkerette-specific 
primers, with 0.4% of the primary PCR used in the nested round. Nested PCR 
products were gel purified and sequenced using standard methods. 

The Helq*© mouse strain was generated using standard transgenic technology. 
For more background and discussion pertaining to this and other aspects of the 
manuscript, see elsewhere in the Methods and Extended Data. FANCD2-deficient 
mice were described previously*. Mice were identified using standard ear snip 
methods, and snips were used for genomic DNA preparation using the HotSsHOT 
method” for use in genotype PCR. Helq wild-type, heterozygous and mutant mice 
were genotyped in a single reaction using the following primers: pGTOLxF_F2-CA 
GGGTTTTCCCAGTCACGAC (genetrap-specific primer), mHELQint]1_F8-GT 
CCTTGTGCCCAAAGTACAG (wild-type-specific primer), mHELQint11_R5-CC 
TAGTGTGGCTTATCTCCTTCTTC (common primer). Fancd2 mice were geno- 
typed according to the previously described method’. 

Breedings to establish Mendelian ratios and fertility consisted of continuously 
mated, individually housed pairs. The Helq Fancd2 double-mutant strain was 
established from mating of double heterozygous Helq*'“© Fancd2*'~ mice. Weights 
of Helq mice were measured weekly starting at 10 days post-partum. The tumour 
watch cohort was established using littermate controls wherever possible and mice 
were regularly monitored for signs of deterioration using a scoring system that will 
be described in a separate publication. Mice were euthanized before terminal end 
points were reached. 

Statistics. For survival study, sample size was estimated using standard power 
calculation methods in order to measure a difference of 3-4 months in survival 
between mutant and control groups. For Helq and Helq Fancd2 matings, devia- 
tions from expected Mendelian ratios were tested by Chi-squared analysis 
(P > 0.25 for both strains). For fertility analysis, the number of litters and pups 
were tested using Kruskal-Wallace analysis of variance. The Gaussian approxi- 
mation of P values indicated that medians varied significantly between the groups 
(P = 0.0019 for litters, P = 0.0014 for pups). Dunn’s multiple comparison post- 
test was used to compare specific sample pairs. Results of P value calculations for 
both litters and pups were the same, and are the values indicated on the graph 
(P <0.001 for control versus mutant pairs, P = not significant for control versus 
male or female mutant pairs). For wild-type versus Helq mutant testes weights, the 
Mann-Whitney test was used to analyse whether the weights differed between the 
two groups. Gaussian approximation was used for calculation of the indicated P 
value (P < 0.0001). For wild-type and Helq mutant testes weights versus age, linear 
regression was used to generate slopes of best-fit lines which were tested for 
deviation of slope from zero. R? and P values indicated on the graph demonstrate 
that although there is a correlation between reduced testes weight and age in wild- 
type mice (P = 0.0091), there is no correlation between testes weight and age for 
Helg mutants (P = 0.78). For spermatogonia numbers in wild-type versus Helq 
mutant neonatal testes, the Mann-Whitney test was used to analyse whether 
spermatogonia differed between the two groups. Gaussian approximation was 
used for calculation of the indicated P value (P< 0.0001). For MMC-induced 
metaphase aberrations in wild-type versus Helq mutant cell lines, the Mann- 
Whitney test was used to analyse whether radial chromosomes per metaphase 
differed between the two groups (P< 0.0001). For wild-type versus Helq and 
Fancd2 single- and double-mutant testes weights, the Kruskal-Wallace analysis 
of variance test was used. Gaussian approximation of the P value indicated that 
medians varied significantly between the groups (P< 0.0001). Dunn’s multiple 
comparison post-test was used to compare specific sample pairs. Results of P value 
calculations are indicated on the graph (P < 0.001 for both wild-type versus Fancd2 
single mutant and wild type versus Helq Fancd2 double mutant). For MMC- 
induced metaphase aberrations in wild-type versus Helq and Fancd2 single mutants 
and Helq Fancd2 double mutants, the Kruskal-Wallace analysis of variance test 
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was used. Gaussian approximation of the P value indicated medians varied signi- 
ficantly between the groups (P < 0.0001). Dunn’s multiple comparison post-test 
was used to compare specific sample pairs. Results of P value calculations for MMC- 
damaged samples are indicated on the graph (P < 0.001 for wild-type versus single 
and double mutants, P<0.01 for Helq mutants versus double mutants, and 
P= not significant for Helq versus Fancd2 single mutants and for Fancd2 mutants 
versus double mutants). For HELQ—-GFP chromatin recruitment upon ATR inhib- 
itor treatment, Student’s t-test was performed to determine whether the observed 
differences were statistically significant. For DNA combing of replication tracts 
from wild-type versus Helq mutant cells at atmospheric O., and tracts from wild- 
type, Helq and Fancd2 single, and Helq Fancd2 double mutants at physiological O., 
the Kruskal-Wallace analysis of variance test was used. Gaussian approximation 
of the P values indicated that medians varied significantly between the groups 
(P<0.0001 for both experiments). Dunn’s multiple comparison post-test was 
used to compare specific sample pairs. Results of P value calculations are indicated 
on the graphs (P< 0.01 for wild type versus Helq mutants and for undamaged 
versus CPT damaged Helq mutant cells; P< 0.001 for wild type versus single and 
double mutants). For wild type versus Helq mutant tumour watch, the Kaplan- 
Meier epithelial and stromal tumour-free survival curve was analysed with Mantel- 
Cox log-rank test for P value calculation (P = 0.009). 

Histology, immunohistochemistry sample preparation and reagents. For all 
histology, samples were paraffin embedded and sectioned at 41m. Testes for 
histology were fixed in Bouin’s solution, Periodic acid-Schiff stained and haema- 
toxylin counter-stained; all other tissues, including post-mortem tissues for ana- 
lysis of tumour watch cohort, were fixed in 10% neutral buffered formalin (NBF) 
and stained with haematoxylin and eosin. 

Immunohistochemistry was carried out using standard methods. In brief, testes 

for immunohistochemistry were NBF fixed and sections from adult mice were 
processed for c-KIT staining by microwaving in 0.01 M citrate buffer, pH 6, to 
unmask antigens. After incubation with c-KIT primary antibodies (Dako A4502, 
1:600), samples were incubated with biotinylated secondary antibody (Vector) 
followed by incubation with Avidin Biotin Complex (Vector); slides were developed 
in 3, 3’-diaminobenzidine (DAB) substrate (Vector) and counterstained in haema- 
toxylin. Neonatal testis sections were similarly processed and labelled with antibodies 
against WT1 (Santa Cruz sc-192, 1:450). 
Cell line derivation. Ear fibroblasts for primary and SV40 immortalized cultures 
were generated as follows: mice were euthanized and ear tissue was collected using 
sterile scissors, ear fragments were rinsed twice in 70% ethanol followed by two 
rinses in PBS supplemented with 100 1g ml’ kanamycin. Tissue was transferred 
into 0.3 ml of protease solution (4mg ml! each of collagenase D and dispase in 
DMEM; filter sterilized), and incubated at 37 °C for 45 min. In total, 1.5 ml DMEM 
containing 10% FBS, 1X glutamine and 5X antibiotic—antimycotic solution were 
added to protease solution containing ear fragments, and samples were incubated 
at 37 °C overnight. Cells were dissociated by pipetting, passed through a 40-um 
mesh cell strainer, and plated in DMEM as above except using 1X antibiotic- 
antimycotic solution. Cells were passaged upon reaching confluence to five dishes, 
and upon reaching confluence, cells were frozen at passage 1 or used immediately 
for immortalization or experiments. 

Fibroblasts were immortalized via transfection with a vector expressing SV40 
large T antigen. Constitutively expressed HELQ-Flag, Flag, and ALC1-Flag cell 
lines were generated using the 293 Flp-In system according to the manufacturer’s 
protocol (Invitrogen). NIH3T3 cells stably expressing GFP-tagged mouse HELQ 
(consisting of a bacterial artificial chromosome (BAC) containing the entire Helq 
promoter and genomic locus) were generated according to the BAC recombineer- 
ing method described previously’. These and all other cell lines used in this study 
(293T, HeLa, NIH3T3 and U20S) were grown in DMEM supplemented with 10% 
FBS and L-glutamine. Cells were grown in 5% CO), incubators at atmospheric O, 
concentrations (~21%) with noted exceptions where samples were cultured at 
physiological O, concentrations (~5%). 

For HELQ-GFP transient transfections in HeLa cells (Extended Data Fig. 3d), 

human HELQ was cloned into the pcDNA6.2/C-EmGFP-DEST vector using 
Gateway technology (Invitrogen). The vector was transfected into HeLa cells with 
Lipofectamine 2000 using the manufacturer’s protocol (Invitrogen). Live or para- 
formaldehyde (PFA)-fixed cells (fixed cells were counterstained with DAPI (4’,6- 
diamidino-2-phenylindole)) were visualized 48-96 h after transfection under epi- 
fluorescence using a Zeiss Axio Imager M1 microscope with an ORCA-ER camera 
(Hamamatsu), and images were acquired using the Volocity software (Improvision, 
Perkin Elmer). 
Clonogenic survival, metaphase spreads, growth and micronuclei analyses. 
For all experiments, fibroblast lines established from littermates or siblings were 
used wherever possible. Experiments involving primary cells were conducted in 
physiological O, using cell lines of similar passage number. 
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For clonogenic survival assays, SV40 immortalized mouse ear fibroblasts and 
siRNA-treated U2OS cells were plated in triplicate on 10-cm dishes at clonal den- 
sity, allowed to adhere for 8-16h, and damage treatments administered (CPT 
medium was changed after 24h). After 8-10 days of growth, plates were rinsed, 
fixed/stained in 20% ethanol/4% crystal violet (w/v), rinsed in distilled water and 
colonies tabulated. All results were normalized to untreated to adjust for plating 
efficiency and determine percentage survival. Survival experiments were carried 
out on at least two independent sets of mutant and control mouse cell lines, and in 
most cases cell lines were tested in at least two independent experiments. Similar 
results were obtained across all experiments and sets of cell lines. 

For analysis of metaphase aberrations, SV40 immortalized cells were damaged 
as indicated and treated with colcemid (2 X 107 M) for 30 min, collected, swelled 
in hypotonic solution (0.075 M KCl) for 7 min at 37 °C, fixed and washed in ice- 
cold methanol-acetic acid (3:1), dropped on humid slides and briefly steamed over 
a 65°C bath. Slides were dried, stained with Giemsa (Sigma) for 10 min, rinsed 
with distilled water, and coverslips were mounted (Permount, Fisher). For each 
sample =40 spreads were scored. 

Growth kinetics of primary cells were determined using a modified 3T3 protocol 
to calculate cumulative population doublings. In brief, primary cells were collected, 
counted and 150,000 cells reseeded in triplicate 10-cm dishes every third day. Cumu- 
lative population doublings were calculated using the formula: log; 9(/no) X 3.32, 
in which n = number of cells collected after growth and mp = number of cells 
seeded. Similar growth kinetics were obtained for three different mutant and con- 
trol mouse embryonic fibroblasts and embryonic fibroblast cell line pairs. 

Micronuclei were examined by seeding SV40 immortalized embryonic fibro- 
blasts in 6-well dishes containing glass coverslips. The following day, cells were 
fixed in 4% PFA and stained with DAPI. For each sample = 100 cells were scored. 
Bone marrow protocols, antibodies/reagents, analysis. Bone marrow cells were 
collected from 3—4-month-old control and Helq*““© mice and analysed for hae- 
matopoietic stem and progenitor cells (HSPCs) as described previously*”. For 
LSK (lineage” Sca-1* c-Kit*) staining, cells were stained in Hank’s balanced salt 
solution containing 2% FBS and 10 mM HEPES buffer (Gibco) using biotinylated- 
anti-lineage antibody cocktail (anti-Macla, Gr-1, Ter119, CD3e, CD4 and CD8a), 
phycoerythrin (PE)-Cy-7-anti-Sca-1 antibody (clone E13-161.7), and APC-anti- 
c-Kit antibody (clone 2B8), followed by staining with PE-streptavidin secondary 
antibody (all primary and secondary antibodies from BD Biosciences). The samples 
were acquired using a BD FACSAria high-speed sorter. 

For c.f.u.-c assays, bone marrow cells were seeded in 12-well plates at a density of 
7X 10* cells per well in mouse MethoCult medium M3434 (StemCell Technologies) 
and haematopoietic colonies (c.f.u.-c) were counted at 7-10 days after culture. 
c.f.u.-c assays were used to determine the survival of bone marrow in presence of 
MMC. To determine the ex vivo clonal growth of murine HSPCs, a cobblestone 
area-forming cell (CAFC) assay was performed by a limiting dilution analysis of 
bone marrow in micro-cultures using the bone marrow stromal cell line FBMD-1 
(refs 8, 23). This assay quantifies a spectrum of haematopoietic cells that is well- 
validated to compare with other functional assays. Specifically, day-7 and day-14 
CAFC correspond to early progenitor cells and to c.f.u.-spleen-day-12 cells, whereas 
the more primitive haematopoietic stem cells with long-term repopulating ability 
correspond to day-28 CAFC”. 

Bone marrow transplantation was performed as described previously’. In brief, 
bone marrow cells (5 X 10°) from control or Helg®“ AC mice (CD45.2") were 
mixed with 2.5 X 10° bone marrow supporting cells from CD45.1* congenic mice 
and transplanted into lethally irradiated CD45.1* congenic recipient mice. The donor 
cell engraftment efficiency in the recipient mice, after 17 weeks post-transplant, 
was determined by staining peripheral blood leukocytes with FITC-labelled anti- 
CD45.2 (clone 104) antibody. The percentage of donor-derived T cells, B cells and 
myeloid cells was determined by co-staining with PE-labelled anti-CD3e (clone 
145-2C11), anti-B220 (clone RA3-6B2) and anti-Mac-1/Gr-1 antibodies (clones 
M1/70 and RB6-8C5), respectively, and analysed on a FACScan instrument 
(Becton Dickinson). All antibodies were from BD Biosciences. 

Mass spectrometry and proteomics. HELQ-Flag and Flag control cells were 
collected and lysed in benzonase lysis buffer (20mM Tris-Cl, pH7.5, 75 mM 
NaCl, 10% glycerol, 2mM MgCh, 0.5% NP40, 30U ml ! benzonase, protease 
inhibitors). NaCl concentration was adjusted to 150 mM, EDTA to 3mM and 
lysates were cleared by centrifugation. Supernatants were pre-cleared with Protein 
G agarose beads for 30 min at 4 °C. Pre-cleared lysates were incubated with anti- 
Flag affinity agarose resin (Sigma) for 4h at 4°C. Beads were washed five times 
with wash buffer (20mM Tris-Cl, pH 7.5, 150mM NaCl, 3mM EDTA, 0.5% 
NP40) and once with PBS. Bound proteins were eluted by boiling in SDS-PAGE 
sample buffer and eluates were resolved on NuPAGE Bis-Tris gels (Invitrogen) and 
stained with Sypro Ruby (Invitrogen). Gel slices were excised and processed for mass 
spectrometry using the Janus automated liquid handling system (PerkinElmer). 
Peptides were analysed by nanoscale capillary liquid chromatography-electrospray 


ionization/multi-stage mass spectrometry (LC-ESI MS/MS), data were processed 
using Mascot Distiller (Matrix Science) and exported to Scaffold for viewing 
(Proteome Software). 

The Biological General Repository for Interaction Data sets (BioGRID, http:// 

thebiogrid.org/), the Molecular INTeraction database (MINT, http://mint.bio. 
uniroma2.it/mint), and Search Tool for the Retrieval of Interacting Genes/Proteins 
database (STRING, http://string-db.org/) were used to compile the protein inter- 
action network. 
Cell lysates, in vitro binding assay and fractionation for western blot analyses. 
All cell lines used in this study were short tandem repeat-profiled and tested for 
mycoplasma infection before use. All lysis buffers were supplemented with prote- 
ase inhibitor cocktail (Roche) and phosphatase inhibitors (Sigma). 

For validation of mass spectrometry data, HELQ-Flag- and Flag-expressing 
cells were used. (This was due to our inability to validate these interactions using 
endogenous HELQ, stemming from the fact that it is expressed at very low levels in 
most human cell lines, and no antibodies were found to reliably immunoprecipi- 
tate the human version. Validation using endogenous mouse HELQ was similarly 
hindered by a lack of reagents available for detection of the mouse RAD51 para- 
logues.) Cells were lysed in the presence of benzonase and 2 mg of total protein 
were immunoprecipitated with anti-Flag affinity resin as above. Beads were washed, 
bound proteins eluted with 1 X NuPAGE LDS sample buffer and analysed by 
western blot. Similar methods were employed using lysates prepared from 293T 
cells to examine endogenous HELQ or RAD51C coimmunoprecipitates. 

For in vitro binding assays, HELQ-Flag, ALC1-Flag and Flag cells were lysed in 
the presence of benzonase and pre-cleared lysate was used for Flag immunopre- 
cipitate as described above. Flag-immunocomplexed beads were then washed four 
times with a modified wash buffer containing 1 M NaCl to remove bound co- 
precipitates, and once with in vitro binding buffer (20 mM Tris-Cl, pH 7.5, 280 mM 
NaCl, 3 mM EDTA, 0.5% NP40). Washed beads were incubated with recombinant 
RADS1 paralogue BCDX2 complex (gift of S. West’s laboratory) in binding buffer 
for 4h at 4 °C and washed four times with the same buffer. Eluates were analysed by 
western blot. 

GFP-tagged HELQ was stably expressed in NIH3T3 cells using a BAC recom- 
bineering method to C-terminally Flag/GFP-tag the BAC-containing full-length 
genomic Helg, which included the endogenous promoter. This allowed HELQ- 
GFP to be expressed at physiological levels. 

For chromatin fractionation of embryonic fibroblasts and siRNA-treated U2OS 

cells, cells were treated with or without 3 1M aphidicolin for 6h or with or with- 
out 1mM MMC for 24h, collected and fractionated using a modified version of 
the method described in ref. 26: pellets were re-suspended in CSK buffer (10 mM 
PIPES, pH 6.8, 100 mM NaCl, 300 mM sucrose, 3 mM MgCl, 1mM EGTA, pH7, 
0.5% Triton X-100), incubated for 10 min on ice (a small fraction of this was 
removed and SDS-PAGE sample buffer was added to obtain WCEs), pelleted at 
low speed and supernatants reserved as soluble fraction. Pellets were washed in 
CSK buffer, and re-pelleted. Pellets were re-suspended in benzonase CSK buffer 
(10mM PIPES, pH7.5, 100mM NaCl, 300mM sucrose, 3mM MgCh, 0.5% 
Triton X-100, 0.1U wl? benzonase), lysates were incubated for 30 min at 37 °C, 
pelleted and supernatants reserved for chromatin fraction. Pellets were re-suspended 
in high-salt CSK (recipe as above except NaCl was added to 500 mM), lysates were 
incubated for 10 min on ice, cleared at high speed and supernatants pooled with 
benzonase CSK lysates to yield chromatin extracts”. In total, 25 jg of soluble and 
10 pg of chromatin proteins were analysed by western blotting. 
Western blot, immunoprecipitated antibodies/reagents and siRNA oligonu- 
cleotides. Precast NuPAGE Bis-Tris or Tris-acetate gels and electrophoresis system 
were by Invitrogen. Western blotting transfers were carried out in BioRad transblot 
chambers and all blots were blocked and probed in 5% milk-phosphate buffer 
saline triton X100 (PBST) with the exception of p-CHK1 blots, which were processed 
in 3% BSA-Tris buffered saline triton X100 (TBST). Mouse and rabbit horseradish 
peroxidase-conjugated secondary antibodies were from ThermoFisher, and sig- 
nals were visualized with ECL western blotting detection reagent (Amersham) or 
SuperSignal West Femto reagent (Thermoscientific). 

Antibodies used for western blot analysis: Flag (Sigma F1804, 1:2,000), HELQ 
(Santa Cruz 81095, 1:200), His (Clontech 631212, 1:2,000), PARP1 (Trevigen 
4338-ML-50, 1:1,000), CHK1 (Sigma C9358, 1:500), S345-P-CHK1 (Cell Signalling 
2348, 1:500), CHK2/p-CHK2 (Upstate 05-649, 1:400), ATM (Sigma A1106, 1:2,000), 
$1981-p-ATM (Cell signalling 4526, 1:1,000), histone H3 (Abcam 10799, 1:2,000), 
a-tubulin (Sigma T6199, 1:2,000), RAD51 (Santa Cruz 8349, 1:200), FANCD2 
(Epitomics 2986-1, 1:1,000), YH2AX (Cell Signalling 2577, 1:1,000), RPA32 (Abcam 
12F3,3, 1:1,000), BRCA2 (Santa Cruz 8326, 1:200), TFIIH p89 (Santa Cruz 293, 1:200). 
All RADS1 paralogue antibodies were a kind gift from S. West’s laboratory, as 
described in ref. 30: RAD51B (IH3 mouse monoclonal antibody, 1:500), RAD51C 
(2H11 mouse monoclonal antibody, 1:500), RAD51D (5B3 mouse monoclonal 
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antibody, 1:400), XRCC2 (7B7 mouse monoclonal antibody, 1:400), XRCC3 (10F1 
mouse monoclonal antibody, 1:400). 

Antibodies used for immunoprecipitation: RAD51C (R68 rabbit antibody), 
XRCC3 (10F1 mouse monoclonal antibody), mouse IgG (Abcam 18413) and 
rabbit IgG (Abcam 46540) were used where appropriate as negative control immu- 
noprecipitates. 

siRNA oligonucleotides: RADS51 (ref. 31) 5’-AAGGGAAUUAGUGAAGCCA 
AA-3', BRCA2 (ref. 32) 5'-AACAACAAUUACGAACCAAAC-3’, siRNA oligo- 
nucleotides used in DR-GFP and in HELQ_1 (this study): 5'-GAAGGUCCAA 
UAUAAUU-3’, HELQ_3 (this study): 5’-AAUGUGAGGUGAUUAAGAA-3’, 
HELQ_M®*:5'-CAAAGGAAGATTTCCTCCAACTAAA-3’, HELQ-01: 5'-GUUU 
GAAGAUUGCAACGAA-3’, HELQ-03: 5'-AAUGUGAGGUGAUUAAGAA-3’, 
HELQ-04: 5'-GGUAGAAGAGUUACUAAGA-3’, HELQ-17: 5'-GUUUGAAGA 
UUGCAACGAA-3’, XRCC2: 5'-CAGGGTACTACGCAAGCCT-3’, XRCC3: 5'- 
CAGAATTATTGCTGCAATT-3’, RADSIC: 5'-AAGAGAATGTCTCACAAAT-3’, 
RADSID: 5'-CTGGGTGGAAATAAGCTTA-3’. 
siRNA transfection and ATR inhibition. U2OS cells were subjected to two rounds 
of reverse transfections using siGENOME siRNA and Dharmafect1 (Thermofisher) 
according to the manufacturer’s protocol. Thirty-six hours after the second trans- 
fection, cells were treated for 14h with 3 1M aphidicolin. For ATR inhibition, 3 uM 
ATR inhibitor was added to cultures 30 min before aphidicolin treatment. 
Immunofluorescence. Cells were first washed in PBS and then fixed with 2% PFA 
at room temperature (18 °C) for 15 min, and then washed three times in PBS. The 
fixed cells were further permeabilized with 3% BSA in PBS plus 0.1% Triton X-100 
for 30 min at room temperature. Primary antibodies (RAD51; 1:500, RPA; 1:1,000, 
yH2AX) were added and incubated at 37 °C for 1h. After washing with PBS plus 
0.1% Triton X-100, secondary antibodies (provided by Jackson ImmunoResearch) 
were applied and incubated for 1 h in the dark. The stained coverslips were mounted 
with prolong Gold Antifade reagent (Invitrogen). Imaging was carried out using 
Axio Imager (Zeiss) or Axioplan 2 Imaging (Zeiss) microscope and analysed by 
Axiovision software (Zeiss). 

Homologous recombination reporter assays. DSB repair efficiency by homolog- 
ous recombination was measured in DR-GFP U20S cells as described previously™*. 
In brief, 48h after the first round of siRNA transfection (40 nM) using Lipofec- 
tamine RNAiMAX (Invitrogen), cells were either mock-transfected (pcDNA3.1) 
or transfected with 0.6 jg of an I-Scel expression plasmid (pCBASce) together with 
siRNA (20 nM) using 3.6 pl of Lipofectamine 2000 (Invitrogen). The media was 
replaced 3 h after I-Scel transfection and cells were analysed for GFP expression by 
flow cytometry on a Cyan ADP (Dako) 72h after I-Scel transfection. To confirm 
siRNA efficiency, western blotting was carried out on 50 jig of NP-40 lysates plus 
sonication run on 4-15% Precast SDS-PAGE gels (Bio-Rad). 

DNA combing. Replication tracts were labelled for 20 min with 20 uM iododeox- 
yuridine (IdU) (in atmospheric O2 experiments, cells were treated with or without 
2.5 UM CPT for the final 15 min of IdU labelling to test replication fork stalling/ 
restart), washed three times with PBS and labelled for 20min with 200 1M 
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chlorodeoxyuridine (CldU). Cells were washed, collected on ice, counted and 
embedded in agarose. Cells were digested in 2-3 changes of proteinase K buffer 
for 24h at 50°C. Plugs were washed for 5 X 10 min in Tris-EDTA followed by 
B-agarase digestion overnight at 42 “C. Genomic DNA was combed onto silanized 
coverslips (Genomic Vision) using a Molecular Combing System instrument 
(Genomic Vision), dried and stained using previously described methods”. 
Experiments were conducted on two separate sets of mutant and control cell lines 
with similar results obtained for both experiments. 

Pulsed field gel electrophoresis analysis. Immortalized embryonic fibroblasts 
were treated with or without MMC for 1 h and then allowed to recover for 16-48 h. 
Cells were collected and processed for pulsed field gel electrophoresis (PFGE) 
analysis similar to previously described methods™. In brief, cell suspensions were 
placed on ice, cell numbers counted and equivalent cell numbers or each genotype 
were embedded in agarose plugs in duplicate. Cells were digested by incubating 
plugs in proteinase K overnight at 50 °C, plugs were washed 4 X 1h, sealed with 
low-melting-point agarose into the well of a 1% agarose/0.5X Tris/borate/EDTA 
(TBE) pulsed field gel, and run for 24h on a Gene Navigator PFGE apparatus 
(Amersham) using the following conditions: running temperature: 13 °C; running 
angle: 120° (hex electrode); connection setup: interpolation (phase 1 of 2: N/S 30s, 
E/W 30s, phase time 23 h; phase 2 of 2: N/S 5s, E/W 5s); power program: 180 V 
15 min, 170 V 30 min, 160 V 1h, 150 V 2h, 140 V 4h, 130 V 8h, 120 V 7 h. Gels 
were post-stained with ethidium bromide and washed in 0.5 x TBE. PFGE was 
carried out on two independent sets of mutant and control cell lines and results 
were repeated two or more times for each set of cell lines. Similar results were 
obtained across all experiments and sets of cell lines. 
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Extended Data Figure 1 | Allele and subfertility of HELQ deficiency. 

a, Schematic of the /-geo gene trap and its approximate location within the Helq 
genomic locus. b, Sequence and traces showing the exact location of the gene 
trap insertion as determined by sequencing of splinkerette PCR products. 

c, HELQ western blot from wild-type (+/+) and Helq mutant (AC/AC) mouse 
cells, showing loss of HELQ and absence of a HELQ-fgal fusion protein 
(which, if present, would be evident in the region of the blot marked with the 
red bar). d, Table of observed and expected Mendelian ratios calculated from 
heterozygous matings. Chi-squared analysis was used to test for deviation of 
observed from expected. e, Average weights of Helq mice tracked between 2 and 


(0) 2 4 6 8 1a 62 
Age (weeks) 


Ar) & 


Pa 


12 weeks of age. Means of 5-13 mice for each group are shown, and for clarity, 
s.d. is not plotted. Differences are not significant. f, Table of ovarian pathology 
in 30-week-old Helq control and mutant females (black text) and heterozygous 
females within the tumour watch study (blue text, 17-21 months old). 

g-l, Histological sections of testes from Helq*“’*© males showing various 
degrees of atrophy, including: normal tubules (g), mild atrophy (h, i), pockets of 
atrophy (arrows), pyknotic nuclei (asterisks); moderate atrophy (j, k), missing 
spermatogenic layer (arrowheads) and severe atrophy (1), with only Sertoli cells 
present. 
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Average # mice 
C-Kit+: Sertoli (SD) | (tubules/mouse) 


0.715 (0.174) 3 (20) 
0.262 (0.116) 4 (20) 
Re P-val 


Testes (% of body weight) 


0 100 200 300 400 
age (days) 


1 including: salivary, thyroid, parathyroid, adrenal, mammary, and endometrial gland adenomas and cystic hyperplasias 


2 including: leiomyosarcoma, spindle cell sarcoma 


* pituitary/ovaries not identified in 1 of 26 PMs, therefore tumor frequency calculated from 25 animals in the mutant cohort 


Extended Data Figure 2 | HELQ-deficient germ cell and tumour 
phenotypes. a, Immunohistochemical analysis of adult testes labelled with the 
stem cell marker c-KIT* (brown) to highlight spermatogonia, and 
counterstained with haematoxylin to visualize remaining cells in the tubule 
(blue). Two representative images from wild-type (left) and mutant (right) 
mice are shown, with red circles indicating c-KIT* cells. The number of c-KIT* 
cells for each panel is indicated in the bottom right corner. Boxed regions in top 
panels are magnified in bottom panels to demonstrate staining. S, Sertoli cells; 
SG, c-KIT* spermatogonia. b, Tabulation of average c-KIT* cells per tubule 


normalized to the number of Sertoli cells. c, Testis weights plotted by mouse age 
in days. Linear regression used to generate slope of best-fit line; tested best-fit 
line for deviation of slope from 0: R’ and P values are indicated, revealing no 
correlation between age and testes weight for Helq mutants. d, Table of tumour 
frequency and tumour spectrum of Helq mutant and control mice showing data 
for all mice, females, and males in the tumour watch cohort. 129/B6 
background phenomena are coloured in grey text, Helq mutant-specific effects 
are in black, and female-specific pathology is highlighted in pink. 
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Extended Data Figure 3 | Tumour histology of HELQ deficiency. a, The nodular granulosa cell tumour from a Helq mutant. Arrowheads indicate 


frequency of liver steatosis in all mice, and inflammation and activated mitotic figures (f). g-j, Pituitary sections showing low-magnification (g) and 
mammary tissue in female mice. b-f, Ovary sections showing normal wild-type _ high-magnification (h) images of normal wild-type pituitary. Low- 

ovary (b) and common ovarian pathology in mutant animals (c-f). Low- magnification (i) and high-magnification (j) images of pituitary tumour from a 
magnification (c) and high-magnification (d) images of dysgenic ovary froma | Helq mutant mouse. Arrows indicate boundary where large, haemorrhagic 
Helq mutant exhibiting a sex cord stromal tumour containing tubular-like pituitary adenoma compresses overlying brain (h). 


structures. Low-magnification (e) and high-magnification (f) images of large 
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Extended Data Figure 4 | Characterization of Helq*@*© bone marrow and 
generation of Helq Fancd2 double-mutant offspring. a, Helq“@/“° and 
control bone marrow cells were isolated and exposed to MMC at the indicated 
doses and clonogenic survival of haematopoietic progenitors was plotted as 
percentage of surviving cells relative to untreated. Means + s.e.m. for three 
mice per genotype are shown. b-i, Bone marrow (BM) from mutant and 
control mice was isolated and subjected to various haematopoietic stem and 
progenitor cell analyses: tabulation of bone marrow LSK (lineage Sca-1* 
c-Kit*) cell populations (b, ¢c); bone marrow c.f.u.-c (colony-forming units in 
culture) assays (d); bone marrow day-28 cobblestone area-forming cells 


(CAFCs; e); and total donor-derived leukocyte (f), myeloid (g) and lymphoid 
(h, i) engraftment upon bone marrow transplantation. Raw data (symbols) and 
means (horizontal lines) from three mice are plotted (b, c, e); means + s.e.m. 
for three mice per genotype (d); and means ~ s.e.m. for 6-10 recipients for 
each genotype (f-i). j, siRNA-treated U2OS cells were plated for 

clonogenic survival and treated with the indicated reagents. k, Observed and 
expected Mendelian ratios calculated from Helq*/*© Fancd2*’— double 
heterozygous matings. Chi-square analysis was used to test for deviation of 
observed from expected. 
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Extended Data Figure 5 | HELQ mass spectrometry, its relationship with 
the RAD51 paralogues, ATR, and overexpression. a, HELQ purification 
scheme and SDS-PAGE gel showing proteins co-purified with HELQ—Flag and 
control Flag immunoprecipitates. b, Cells treated with the indicated siRNAs 
were collected and probed for HELQ and the RADS1 paralogues. c, XRCC3 
immunoprecipitated from HELQ-Flag and Flag control cell lysates and probed 
for Flag, XRCC3 and RADSIC (positive XRCC3 interacting protein control). 
IgG was used as a negative control. d, HeLa cells transiently expressing 


recombinant HELQ-GFP (green panels) fixed and stained with DAPI (blue 
panels) to identify nuclei. Two examples of spontaneous nuclear aggregation 
patterns are shown: small focal aggregates (right) and large filamentous 
aggregates (left). e, Chromatin fractions of HELQ-GFP cells treated with or 
without 2 1M APH for 24h. f, Cells treated as in e, with or without 3 mM ATR 
inhibitor (ATRi). Quantification of HELQ in chromatin fractions normalized 


to H3 (right). g, Cells treated with or without 100 ngml~' MMC for 24h and 
fractionated as in e. 
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Extended Data Figure 6 | Spontaneous defects, checkpoint indices, damage 
foci and clonogenic survival of HELQ-deficient cells. a, Primary Helq mutant 
and control cell lines were grown in physiological O; for the indicated number 
of days and passaged regularly to generate a cumulative population doubling 
(CPD) curve. Means + s.d. of triplicate replicas are shown. b, c, Helq mutant 
and control cells grown on coverslips were formaldehyde fixed and DAPI 

stained to determine levels of spontaneous micronuclei formation: percentage 
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of 100 cells exhibiting 1 or more micronucleus (b); representative images 

(c), micronuclei (arrows). d, e, DNA combing used to calculate replication fork 
rates of primary cells grown under atmospheric (d) or physiological (e) Oo. 
Cells in d were treated with or without 2.5 1M CPT for 15 min during labelling. 
f, Examples of origin containing IdU (green)- and CldU (red)-labelled fibres. 
g, Right versus left replication tract lengths to determine fork asymmetry 
(defined as tracts falling outside the interquartile lines). 
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Extended Data Figure 7 | Checkpoint and double- strand break repair later, and lysates were probed for the indicated checkpoint indices. d, Phospho-Ser 
function. a, b, Immortalized Helq*@*© and Helq*’~ cells treated with or 345 CHK 1 levels in U2OS cells subjected to 1 p.M MMC for 24h. e, Immortalized 


without 500 ng ml — ' MMC (a) or 50 nM CPT (b) for 20h and probed for the — mouse cells were treated with 1 1M MMC for 24h, allowed to recover for the 
indicated checkpoint indices. ¢, Primary wild-type (WT), HELQ-deficient (HQ) indicated times (in hours), and stained for YH2AX. f, Quantification of 

and FANCD2-deficient (D2) cells were left untreated (—), or exposed to5Gy __ percentage of positive cells from e. g, RPA32 and RADS1 foci formation in 
irradiation and collected 30 min later, or 3 1M APH for 16h and harvested 10min U2OS cells + 1 uM MMC for 24h subjected to control and HELQ siRNA. 
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Extended Data Figure 8 | Homologous recombination dynamics and PARP 
inhibitor sensitivity. a, U2OS cells treated with or without 100 ng ml — ™MMC 
for 24 h, allowed to recover for the indicated times (in hours), fractionated and 
probed for RAD51. H4 and o-tubulin are shown as controls for chromatin 


fractionation. b, DR-GFP reporter cells treated with the indicated siRNAs were 
probed for BRCA2, HELQ and RADS1; transcription factor IIH (TFIIH) is 
shown as a loading control. c, siRNA-treated U2OS cells were plated for 
clonogenic survival and treated with the indicated reagents. 
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Hidden specificity in an apparently nonspecific 


RNA-binding protein 


Ulf-Peter Guenther, Lindsay E. Yandek’, Courtney N. Niland?, 


Michael E. Harris” & Eckhard Jankowsky' 


Nucleic-acid-binding proteins are generally viewed as either specific 
or nonspecific, depending on characteristics of their binding sites in 
DNA or RNA’”. Most studies have focused on specific proteins, 
which identify cognate sites by binding with highest affinities to 
regions with defined signatures in sequence, structure or both’*. 
Proteins that bind to sites devoid of defined sequence or structure 
signatures are considered nonspecific’*”. Substrate binding by these 
proteins is poorly understood, and it is not known to what extent 
seemingly nonspecific proteins discriminate between different bind- 
ing sites, aside from those sequestered by nucleic acid structures®. 
Here we systematically examine substrate binding by the apparently 
nonspecific RNA-binding protein C5, and find clear discrimination 
between different binding site variants. C5 is the protein subunit of 
the transfer RNA processing ribonucleoprotein enzyme RNase P 
from Escherichia coli. The protein binds 5’ leaders of precursor 
tRNAs at a site without sequence or structure signatures. We mea- 
sure functional binding of C5 to all possible sequence variants in its 
substrate binding site, using a high-throughput sequencing kine- 
tics approach (HITS-KIN) that simultaneously follows processing 
of thousands of RNA species. C5 binds different substrate variants 
with affinities varying by orders of magnitude. The distribution of 
functional affinities of C5 for all substrate variants resembles affinity 
distributions of highly specific nucleic acid binding proteins. Unlike 
these specific proteins, C5 does not bind its physiological RNA targets 
with the highest affinity, but with affinities near the median of the 


Frank E. Campbell’, David Anderson’, Vernon E. Anderson’, 


distribution, a region that is not associated with a sequence signature. 
We delineate defined rules governing substrate recognition by C5, which 
reveal specificity that is hidden in cellular substrates for RNase P. 
Our findings suggest that apparently nonspecific and specific RNA- 
binding modes may not differ fundamentally, but represent distinct 
parts of common affinity distributions. 

The term ‘nonspecific’ is widely used to describe proteins that bind 
DNA or RNA substrates at sites without apparent sequence or struc- 
ture signatures'*°. Although nonspecific proteins are numerous and 
have many important biological roles, a key open question is whether 
the absence of defined recognition elements in nucleic-acid-binding 
sites reflects largely indiscriminate substrate binding, or whether and 
how nonspecific proteins discriminate between different binding sites. 
To answer this question, we systematically examined substrate binding 
for the apparently nonspecific RNA-binding protein C5, the protein 
subunit of RNase P from E. coli. RNase P is a ribonucleoprotein enzyme 
that removes 5’ leader sequences from precursor tRNA (ptRNA) in 
bacteria’ (Fig. 1a). The C5 protein promotes ptRNA processing by 
RNase P®, and contributes to ptRNA binding by associating with six 
consecutive nucleotides in the 5’ ptRNA leaders”"® (Fig. 1a, b). This 
binding site displays no apparent sequence or structure signatures in 
the 87 genomically encoded E. coli ptRNA leaders (Extended Data Fig. 1). 

To determine whether and how C5 discriminates between different 
binding sites, we measured functional binding of C5 to all sequence 
variants in its cognate ptRNA site. Here, functional binding reflects 
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Figure 1 | Processing of precursor tRNA with randomized leader sequences. 
a, ptRNA processing reaction by RNase P. b, Structure of the RNase P 

holoenzyme’. c, Sequences of non-initiator ptRNAM* leaders (reference, black; 
randomized, red). The tRNA body is omitted for clarity. The arrow indicates 
the cleavage site. d, Time courses of RNase P processing of ptRNA™“*? (black) and 
ptRNAM&C3-8N) (red), in the presence (filled circles), and in the absence (open 


circles) of C5. The solid lines are fits to the integrated rate equation for a biphasic 
first order reaction. e, Polyacrylamide gel electrophroresis (PAGE) of reactions 
processed for Illumina sequencing. f, Distributions of species for individual 
time points, ranked from fastest to slowest. The y axis marks the change in read 
numbers for each substrate species at the reaction time indicated, normalized to 
the number of reads at t = 0. Colours emphasize the different reaction times. 
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productive substrate association in an ongoing enzymatic reaction. It is 
expressed by the specificity constant (Kcat/Km, the ratio of turnover 
number and Michaelis constant) for a given substrate variant, which 
measures biologically relevant specificity'’’*. To determine functional 
binding of C5 to all substrate variants simultaneously, we generated 
non-initiator precursor tRNA™“ with a randomized C5-binding site 
(ptRNAM* 58) Fig. 1c), and followed the processing reaction of this 
substrate population (Fig. 1d). Reactions were conducted with excess 
ptRNA™*8™) Under these multiple turnover conditions all sequence 
variants compete for C5 association, and the relative reaction rate for 
each variant reflects functional binding’. 

The time course for the reaction of the randomized ptRN 
population differed markedly from the time course of ptRNA™*S? 
with a genomically encoded leader (Fig. 1d). This difference indicates 
that sequence variation affects functional binding by C5. Removal of 
C5 slowed the reaction rate as expected and greatly diminished the kinetic 
differences between the substrates with the genomically encoded and 
the randomized leaders (Fig. 1d). 

To determine reaction rate constants for the individual substrate 
variants, we isolated remaining substrates at various reaction times and 
measured the distribution of the RNA species by Illumina sequencing 
(Fig. le, f, Extended Data Fig. 2 and Extended Data Table 1). We used 
primers with degenerate barcodes to detect biased amplification of 
sequences during the PCR (Extended Data Fig. 2 and Extended Data 
Table 1). Of the 4,096 sequence variants, 2,900 showed unbiased amp- 
lification and were retained for further analysis. The distribution of 
sequence variants changed over the reaction time, revealing distinct fast- 
and slow-reacting species (Fig. 1f). These data demonstrate that C5 
discriminates between different sequence variants, despite the lack of 
sequence signatures in genomically encoded E. coli ptRNA leaders. 

We calculated a relative processing rate constant (k°*) for each RNA 
variant, using internal competition analysis, developed for the evalua- 
tion of kinetic isotope effects (Extended Data Fig. 3)'*"!°. The k*! value 
is the ratio between the k.at/Km values for the given sequence variant 
and our reference sequence, the physiological leader AAAAAG. The rela- 
tive rate constants for all sequence variants describe C5 binding to the 
entire sequence space of the six-nucleotide recognition site. Our approach 
to measure functional binding of large numbers of substrates during an 
ongoing reaction adds a kinetic dimension to the scope of high-throughput 
sequencing experiments with randomized RNA populations**"*"”. We 
therefore propose to term our method high-throughput sequencing 
kinetics (HITS-KIN). The approach is applicable to other systems for 
kinetic analysis of next generation sequencing data. 

For the ptRNA processing reaction with C5, the HITS-KIN method 
revealed a range of relative rate constants spanning several orders of 
magnitude (Fig. 2a). Obtained relative rate constants were highly repro- 
ducible in independent experiments (Fig. 2b). We also validated rate con- 
stants by direct kinetic measurements of selected sequence variants (Fig. 2c 
and Extended Data Fig. 4). Together, these data show that the HITS-KIN 
approach provides reproducible and accurate relative rate constants. 

We next plotted the number of sequence variants processed at a 
given range of relative rate constants (Fig. 2d). The resulting histogram 
revealed that a significant number of sequence variants reacted faster 
than the physiological leader reference (k*' > 1). Numerous sequence 
variants reacted slower (k**! < 1). These observations indicate that phy- 
siological leader sequences of non-initiator ptRNA™“ are not preferen- 
tially bound by C5. Removal of C5 greatly contracted the range of relative 
rate constants, highlighting the impact of C5 on functional substrate 
binding and on the characteristic affinity distribution (Extended Data 
Fig. 5). 

Most notably, the shape of the distribution of functional C5 affini- 
ties closely resembled affinity distributions of highly specific DNA- 
binding proteins, for which large numbers of sequence variants had been 
examined'**’ (Fig. 2d). This degree of similarity between the non- 
specific C5 and specific proteins was unexpected, given the absence 
of sequence signatures in the C5 binding site. For specific proteins, the 
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Figure 2 | Discrimination of C5 between different precursor tRNAM* 
leader sequences. a, Relative rate constants (k*’) for processing of all ptRNA 
leader sequence variants, ranked from slow to fast. Relative rate constants are 
averaged from four values (two time points of two experiments) and shown for 
only sequences where data from all four measurements passed quality control 
criteria (Extended Data Table 1). The line at k"*! = 1 marks the reference 
sequence. b, Correlation of relative rate constants from two independent 
biological replicates (red line, linear fit through the data; R’, correlation 
coefficient). c, Correlation between relative rate constants obtained by PAGE 
and by the HITS-KIN approach for selected sequence variants. Error bars 
represent the s.d. of three or more individual measurements. d, Distribution of 
relative rate constants for processing of ptRNAM*“**™) sequence variants by 
C5 (black) and apparent affinities for DNA binding by the transcription factor 
Arid3a, indicated as Z-scores based on published microarray data’*. The 
Z-score is not identical to k" values, but accurately reflects affinity-based 
ranking of all sequences'* (triangles, k* values for genomic leader sequences of 
ptRNA™"). e, Plot of all sequence variants ranked from slowest to fastest 
processed. The bracket marks 0.3% of sequence variants with the largest relative 
rate constants. f, Sequence logo for this fraction. 


cellular substrates that define binding site signatures are found at the 
high-affinity tail of the distribution’*”? (Extended Data Fig. 6a, b). Remark- 
ably, this high-affinity region for C5 also shows a clear sequence signature 
(Fig. 2e, f), as seen for specific proteins. In stark contrast to specific proteins, 
the C5 sequence signature does not correspond to the physiological 
binding sites on the non-initiator ptRNA™“'. None of the genomically 
encoded non-initiator ptRNA™“ leader sequences falls into this fastest- 
reacting fraction (Fig. 2d). For both C5 and specific proteins, no sequence 
signatures were detected for other regions of the sequence spectrum 
(Extended Data Fig. 6). Our results therefore reveal remarkable simi- 
larities between sequence discrimination by the apparently nonspecific 
C5 and by specific DNA-binding proteins. At the same time, our data 
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Figure 3 | Rules for sequence discrimination by C5. a, Correlation between 
observed k**! and values calculated with the best fit of the data to models of 
increasing complexity. Logarithmic k"® values are used because of their 
correspondence to differences in binding energies*’. R* expresses the 


highlight a major difference: sequences bound with the highest affinity 
do not represent physiological substrates for C5, but for specific DNA- 
binding proteins with known affinity distributions. 

To delineate sequence determinants that govern substrate recogni- 
tion by C5, we fit the distribution of rate constants to models of increa- 
sing complexity and determined which percentage of the measured 
variance in the rate constants was explained by the respective model. 
Our simplest model considered only the number of a given nucleotide 
in the binding site, regardless of position. This model explained 29% of 
the variance in the measured rate constants (Fig. 3a, left). The model 
suggested favourable binding of sequences rich in adenine and uracil 
(Extended Data Fig. 7a). As A-U base pairs are thermodynamically less 
stable than G-C base pairs, we speculate that the variance explained by 
this model reflects in part the propensity of the leader to form transient 
structures with other parts of the ptRNA”, which potentially compete 
with C5 binding. Although competing structures are generally expected 
in RNAs with more than two dozen nucleotides”, the relatively low 
correlation of the model with measured rate constants suggests that 
competing RNA structures have only limited impact on C5 binding for 
the majority of sequences. 

We next considered both base identity and position in the binding 
site. This model, a traditional position weight matrix”’, explained 39% 
of the variance in measured rate constants (Fig. 3a, middle, and Extended 
Data Fig. 7b). This modest improvement over the previous model indi- 
cated that the position of individual bases in the binding site impacted C5 
binding only to a limited extent. However, the position weight matrix 
assesses the bases independently of each other”’. To probe inter-dependence 
of the bases in the binding site, we used a model accounting for func- 
tional coupling between two bases. This model explained 68% of the 
variance in measured rate constants (Fig. 3a, right). The strongest 
couplings were detected between neighbouring bases (Fig. 3b). 

The observed strength of the couplings between adjacent bases did 
not scale with energies expected to overcome stacking of the respective 
bases”. This finding suggests that the couplings result from interac- 
tions of the RNA with C5, not primarily from inherent RNA confor- 
mations. Functional couplings between more than two base positions, 
assessed by neural network analysis, only modestly improved correla- 
tion between predicted and measured data, and explained 76% of the 
variance (Extended Data Fig. 8). Thus, functional couplings between 
adjacent bases exert the largest influence on C5 binding. The limited 
resolution of the structural model of RNase P protein bound to RNA’ 
currently precludes structural interpretation of these effects. However, 
we note that functional coupling between neighbouring bases also 
contributes markedly to the binding of several specific transcription 
factors to DNA***”*, 

Taken together, the examination of the functional binding data with 
models of increasing complexity reveals defined rules for substrate bind- 
ing by C5. The data demonstrate that discrimination between different 
substrates, and thus specificity, is an inherent property of C5. However, 


correlation of each model with measured processing rate constants. 

b, Functional coupling between two base positions. Yellow squares show 
promotion of processing (high linear coefficients), black squares indicate small 
or no effects, blue squares mark inhibition of processing. 


this specificity is ‘hidden’ in the cellular RNA targets. This observation 
raises the question of why the specificity in C5 has not led to selection of 
ptRNA leaders with high-affinity sequence signatures, as seen in proteins 
with canonical specificity'* *. Our data suggest a further-reaching utility 
of specificity. C5 uses its inherent specificity, as reflected in the rules for 
substrate recognition, to enable binding of diverse substrate variants 
with similar functional affinity. This enables RNase P to process these 
diverse substrates at a similar rate, which may be required for cellular 
tRNA homeostasis”. 

The marked similarities between affinity distributions of C5 and 
those of highly specific transcription factors also raise questions about 
the concept of ‘nonspecific’ RNA-binding proteins. Given that RNA 
binding requires a protein interface to establish interactions with the 
RNA, certain RNA sequence or structure variants conceivably fit this 
interface better than others. Genuine nonspecificity may therefore be 
difficult to accomplish, even for proteins binding exclusively to the RNA 
backbone, because sequence differences impact backbone geometry”. 
Differences between substrate variants may become smaller for proteins 
that bind to the backbone of RNA duplexes, which show less structural 
heterogeneity, but are nevertheless dynamic”. 

Preferences of apparently nonspecific proteins for certain binding- 
site variants are thus likely to impact substrate selection, unless compen- 
sation for these preferences exist. Compensation may arise from varying 
concentrations of RNA species, rate-determining metabolic steps other 
than substrate binding, or a combination of these. Alternatively, a single 
protein could bind multiple distinct substrate regions while thermody- 
namically compensating for the preferences at each region, as shown for 
uniform binding of diverse aminoacyl-tRNAs to elongation factor Tu”. 

Although hidden specificity remains to be revealed for other proteins, 
the findings for C5 indicate that absence of sequence or structure sig- 
natures in cellular binding sites does not reflect an inability to discrimi- 
nate between different RNA binding sites. At the same time, the data 
highlight the key difference between the hidden specificity of C5 and 
proteins that are specific in a canonical sense. For proteins with canonical 
specificity, cellular substrates seem to fall mainly into the high-affinity 
region of the sequence distribution. This region is associated with sequence 
signatures, even for C5. Biological substrates for C5 bind near the median 
of the affinity distribution, which does not produce a sequence signature. 
These findings suggest that specific and nonspecific binding modes 
may not fundamentally differ, but represent distinct parts of similar 
affinity distributions. Our data therefore have potentially broad impli- 
cations for RNA binding by proteins thought to be nonspecific, inclu- 
ding many RNases, RNA helicases or the La-protein. 


METHODS SUMMARY 

ptRNAs and ptRNA™“ with randomized leader sequences were produced by in vitro 
transcription from PCR-generated templates. RNase P processing reactions were 
carried out with 1 uM ptRNA and 5 nM RNase P holoenzyme (equimolar RNase P 
RNA and C5). Product and unreacted ptRNA were separated by PAGE. Complementary 
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DNA libraries for Illumina sequencing were prepared from unreacted ptRNA at 
each given time point. Primers with degenerate barcodes were used to detect biased 
PCR amplification of certain sequences. Sequencing was performed on an Illumina 
GA2. Relative rate constants k"" for individual substrate variants were calculated 
from changes in the distribution of substrates over time, using a multiple turnover 
reaction scheme for competitive substrate kinetics, which was extended to several 
thousand substrates. Computational modelling for the rules of substrate discrim- 
ination was performed by ordinary least squares regression of the matrix of values 
for In(k**") for each sequence variant according to four models of increasing com- 
plexity. The quality of the different models was judged by the correlation coefficient 
between a data set calculated from values obtained from the regression analysis and 
the set of experimentally obtained values for In(k""). 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


E. coli RNase P holoenzyme and RNase P RNA were prepared and tested for 
integrity as described previously*'**. The non-initiator ptRNA™" substrates contain 
8 nucleotides of the genomically encoded leader (Fig. 1c), and 21 nucleotides at the 
5’ end for Illumina sequencing (Extended Data Fig. 2). These RNAs were gene- 
rated by in vitro transcription from DNA generated by PCR amplification of the 
ptRNA™*? gene (PMETS82). The forward primer introduced the T7 promoter 
sequence and the additional 21 nucleotides (Extended Data Fig. 4). The ptRNAM*¢38) 
substrate population with randomized leader sequence N(—3) to N(—8) was gene- 
rated using a primer with this region randomized (NNNNNN). 

The following PCR primers were used (C5 binding site is underlined): 

ptRNAMet82F, 5’-TAATACGACTCACTATAGGGAGACCGGAATTCAGAT 
TGATGAAAAAGATGGCTACGTAGCTCAGTTGG-3’; ptRNA*?p Ecos 5’ 
GGGTTAACCTAATACGACTCACTATAGGGAGACCGGAATTCAGATTGA 
TGAAAAAGATGGCTACGTAGCTCAGTTGG-3’; ptRNAM"B, .domizeds 5’-TA 
ATACGACTCACTATAGGGAGACCGGAATTCAGATTGATGNNNNNNAT 
GGCTACGTAGCTCAGTTGG-3’; ptRNA™“*?R, 5'-TGGTGGCTACGACGG 
GATTC-3’; ptRNAM*R,,,,, 5/-CGGGATCCGAAGACAGTGGTGGCTACGA 
CGGGATTC-3’. DNA templates for substrates L1 to L5 (Fig. 2c) contained the 
following C5-binding sites: L1, TTATAT; L2, TCAGAC; L3, ATTCAA; L4, CG 
TCAG; L5, CTCCTG. 

PCR protocol: 95 °C, 2 min; 30 cycles (95 °C, 30s at 55°C, 45s at 72 °C), final 
extension at 72 °C for 5 min. 

The PCR products (142 base pairs (bp)) were extracted with phenol and chlo- 
roform and recovered by ethanol precipitation. PCR products for the ptRNAM**? 
DNA were amplified with the ptRNAM“’F,,., and ptRNA™“*RBbs primers, which 
include BamHI and EcoRI restriction sites. The PCR product was digested with these 
enzymes and cloned into pUC19. The resulting plasmid, pPTRNAmetT(+21), was 
digested with BbsI to yield the template for in vitro transcription with the correct 
ptRNA™**? 3’ end. 

Invitro transcription was performed in a volume of 400 pul with 15-20 pg of PCR 
template or cloned plasmid DNA template, 400 enzyme units of T7 RNA poly- 
merase (Ambion), 0.01 unit yeast pyrophosphatase, 0.5 mM NTP, and the reaction 
buffer supplied by the polymerase manufacturer was supplemented with 2.5 mM 
MgCl. Reactions were incubated overnight at 37 °C. The full-length RNA was 
purified on 8% denaturing PAGE, as described previously**’. 

Recovered ptRNAs were dephosphorylated using calf intestinal phosphatase and 

5! end labelled with *’P using y°*P-ATP and T4 polynucleotide kinase according to 
standard methods. For the HITS-KIN experiments, the RNA was uniformly labelled 
with y°*P-GTP in the in vitro transcription (NTPs 100 }1M). 
RNase P processing reactions. Multiple turnover reactions were performed in 
buffer containing 50 mM Tris-HCl, pH 8.0, 100 mM NaCl, 17.5 mM MgCl, 0.005% 
Triton X-100, with 1 1M ptRNA and 5 nM E. coli RNase P holoenzyme (1:1 ratio of 
P RNA and C5 protein). Equal volumes (40 pl) of enzyme and radiolabelled 
substrate at two times their final concentrations were prepared in reaction buffer 
and combined to initiate the reaction. Aliquots (5 11) were removed at the times 
indicated for 5% to 30% substrate conversion. The reactions were quenched by 
addition of a solution (5 ul) containing formamide and 100mM EDTA. ptRNA 
and reaction products were resolved on 10% denaturing PAGE (Fig. le). The 
fraction substrate converted to product was determined with a PhosphorImager 
(GE) and the ImageQuant software. Subsequently, precursor bands in the gel were 
located by exposure to X-ray film. The bands were excised and eluted as described 
previously’. Eluted RNA was extracted with phenol and chloroform, and recovered 
by ethanol precipitation. 

Relative rate constants for individual non-initiator ptRNA™“' substrates (L1- 
LS, Fig. 2c, for defined sequences of C5 binding site, see above) were determined in 
reactions containing 1 |tM of the pool of randomized ptRNAM““*), spiked with 
trace amounts (<0.1 nM) of the respective radiolabelled L1-L5 substrate. Time 
courses of the reactions were followed as described above and apparent rate con- 
stants were determined from plots of product accumulation over time*’. As out- 
lined below, the ratio of the observed rate constants is kK because in competition 
kinetics the substrates at the concentrations used behave as V/K systems. V/K is 
proportional to k..,/K,, at a given substrate concentration. 

DNA library preparation. Complementary DNA libraries for Illumina sequencing 
were prepared from unreacted ptRNA, recovered from PAGE as described above. 
RNA was resuspended in 25 pil HO, and concentration was determined with a 
Beckman ultraviolet spectrophotometer. First-strand synthesis was performed with 
4 pmol of this RNA in a 20-l standard reaction containing 1 1M reverse transcrip- 
tion primer (Extended Data Fig. 2a) and 0.5 xl Superscript III (Invitrogen) for 10 min 
at 42 °C, 40 min at 50 °C and 20 min at 55 °C. The reaction was stopped by incubation 
at 95 °C for 5 min. The generated cDNA was diluted (1:300). One microlitre of this 
solution was used in PCR reactions with 1.25 enzyme units Herculase polymerase 
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(Stratagene), reverse transcription primer (0.5 4M) and indexed forward primer 
(0.5 tM) for 2 min at 98 °C, followed by 19 cycles (15 s at 98 °C, 20 s at 59 °C, 20s at 
72°C), and incubation for 10 min at 72 °C. PCR products were purified with P6 
microcentrifuge columns (Bio-Rad) and analysed by agarose gel electrophoresis 
(Extended Data Fig. 2b). The solutions were pooled in an equimolar fashion and 
sequenced in a single lane of an Illumina GA2, according to the manufacturer's 
protocols. Primer sequences were as follows: reverse transcription primer, 5’-CA 
AGCAGAAGACGGCATACGATGGTGGCTACGACGGGAT-3’; indexed forward 
primers (NN, degenerate barcode; underlined letters, index barcode), 5‘-AATGA 
TACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCG 
ATCTNNATCGGGAGACCGGAATTCAGATTG-3’; 5'-AATGATACGGCGA 
CCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTNNG 
ATGGGAGACCGGAATTCAGATTG-3’; 5'-AATGATACGGCGACCACCGA 
GATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTNNCGAGGGAG 
ACCGGAATTCAGATTG-3’; 5'-AATGATACGGCGACCACCGAGATCTACA 
CTCTTTCCCTACACGACGCTCTTCCGATCTNNTCCGGGAGACCGGAAT 
TCAGATTG-3’. 

Processing of Illumina sequencing data. All reads were aligned to the sequence 
of nucleotides 6-29 of the read (Extended Data Fig. 2c), permitting one mismatch 
but no gaps, using the basic local alignment search tool (BLAST). Aligned reads 
were then sorted according to their index tag, and separated into different files. For 
corresponding statistics, see Extended Data Table 1. 

We probed possible over- or under-amplification of certain sequences during 

the PCR using the two-nucleotide degenerate barcode (positions 1 and 2, Extended 
Data Fig. 2c). Correctly amplified sequences show a distribution of degenerated 
barcode nucleotide combinations that is identical, within error, for all leader species. 
Both over- and under-amplification cause deviations from this distribution. We 
determined the distribution of all nucleotide combinations at positions 1 and 2 for 
each leader sequence. The expected distribution of the two-nucleotide degenerate 
barcode was calculated from all 4,096 leader sequence variants. Then, chi-squared 
tests were performed for each leader variant. Sequences for which the threshold 
exceeded o% > 0.05 were excluded from further analysis (between 4% and 10% of 
the sequence variants, Extended Data Table 1). 
Determination of relative rate constants k from Illumina sequencing data. 
Rate constants for individual substrate variants were calculated from time-dependent 
changes of the distribution of substrate variants (Fig. 1f), using a multiple turnover 
reaction scheme for competitive alternative substrate kinetics'*** (Extended Data 
Fig. 3). 

The observed rate constant (v;_;) for processing of one individual substrate (S,) 
is proportional to the fraction of total enzyme that binds this substrate to form a 
complex (ES,) that further reacts to form product and regenerates free enzyme 
according to: 


v1 = ViEfesi (1) 


Here, V, is the first order rate constant for the reaction of ES, to yield free product 
(Extended Data Fig. 3), fgsi is the fraction of total (active) enzyme (E) in the ES, 
complex. Additional substrates act essentially as competitive inhibitors of the 
multiple turnover reaction. Accordingly, v is: 


(2) 


n=7— (3) 


As the denominators of equations (2) and (3) are the same, the ratio of two 
observed rate constants (v;/v) therefore becomes: 


4 
2h ; 


" (&, 


We define the parameter Kas the ratio of the V/K values of a given substrate (S2) 
to a reference substrate (S;): 
4 
(i), 


_ Keel (5) 


©2013 Macmillan Publishers Limited. All rights reserved 


LETTER 


and thus: 


ia . 


The reference substrate S, is the genomically encoded leader sequence for the 
ptRNAM*®? (AAAAAG)*. Thus, k"!> 1 for a ptRNA variant that reacts faster 
than the reference substrate (V;/K;> V,/K;), whereas k*! <1 indicates a slower 
reaction (V;/K;< V,/K;). 

Equations (4) to (6) highlight three important points regarding the use of 
internal competition kinetics for the analysis of deep sequencing data. First, both 
substrates will behave as V/K systems**** regardless of the substrate concentra- 
tions. This is true even if one or both concentrations are greater than the respective 
values for K,,, because both substrates compete for free enzyme’*””. Second, the 
ratio of observed rate constants and the ratio of V/K values are independent of 
enzyme concentration, provided the steady state conditions are maintained. Third, 
the reaction step that limits V/K does not have to be the same for both substrates. 

Integration of equation (5) over time ensures validity of the expression for any 
reaction interval’’, and we obtain: 


(.) 
In o. 
keel — 2,0 (7) 


Here, So and S39 are the initial concentrations of the two substrates. S; (reference 
substrate) and S, (the specific sequence variant) are the respective concentrations 
at a defined time interval. The quantities that can be measured are the relative 
concentrations of S, and S,; that is, S/S, and S),9/S;,9. We define these quantities as 
the ratios (R) between substrates: 


R; => — 8 
: (8) 
Sio 
Ri =— 9 
ose (9) 
The initial mole fractions (X;) of S; are defined as: 
Si 
Xi=5 (10) 


S Sio 

i=1 
So is the concentration of a given substrate at the reaction start, S; is the concen- 
tration at a time point where the overall reaction amplitude for the entire substrate 
population has reached the value f. We obtain: 


fH1- OX 
i=1 


Analogous to the treatment of kinetic isotope effects using internal competition in 
a previous publication", we insert the defined mole fractions and substrate ratios 
(equations (8) to (11)) into equation (11). This is rearranged, and the result is the 
following equation: 


(11) 


Si (—f) 


Sio Rio a Ri x; 
Ri Rio 


i=1 


(12) 


We substitute this term in equation (7), and consider that substrate ratios at time 

zero equal one. 
Ri 
as (13) 
Rio 

We obtain the following expression for the relative rate constant for any substrate, 


Sz 
1 
In =. 7 
i,0 i 
a, 
Tne) 
t= 
a 
1X 
Bis 
Here, R is the ratio of each sequence (including S;) to S;, Ro is the ratio of each 
variant to S; 9 at the reaction start, and_X is the mole fraction for a given sequence 
variant. The method outlined above is applicable to any technique capable of 
determining substrate ratios (for example, mass spectrometry, isotopic counting, 
chromatography). 


k= (14) 


We computed R and_X values for each substrate using the filtered number of 
counts for each variant, obtained from Illumina sequencing (Supplementary Table 2). 
The overall fraction of reacted product was determined by PhosphorImager ana- 
lysis of the PAGE (Fig. le). 

In principle, values for k““' can be computed at any value of f, However, there is 

little relative change in the number of sequencing reads at early time points. However, 
at early time points the highest resolution is seen for the fastest reacting variants, 
while k“ values for slower sequences are optimally measured at greater values of f 
Values of f = 0.1 to 0.3 provided reliable measurements for most sequence variants. 
Nevertheless, for slow-reacting variants small changes in the number of sequen- 
cing reads at early time points are occasionally exceeded by sampling error in the 
number of reads, resulting in negative values for k'*. 
Computational modelling of rules for substrate discrimination. With regard to 
nucleotide type, this model considers the number of each nucleobase in the bind- 
ing site, regardless of its position. For each sequence variant the corresponding 
value of In(k”) (Fig. 3a) is described by a set of linear coefficients (3), according to 
the equation: 


In(K"!) = By + BeA+ BcoeC+ fgeG (15) 


A, Cand G are the number of the respective nucleobases (explanatory value). The 
number of U follows from these variables and is therefore not included (fy = 0). 
Equation (15) describes the average increase in In(k'"!) corresponding to a one-unit 
increase in the explanatory variable. For example, for each additional C in the 
sequence, the In(k") increases by Be. 

Linear coefficients for the entire data set were computed by ordinary least squares 
(OLS) regression, using the open-source statistical package R (http://www.r-project. 
org/) and the exact equation: 


By =(X™X) *XTY (16) 


Here, Yis the vector of outcomes In(k") and X is the matrix of explanatory variables 
(A,C,G). XT isX transposed, and X71 is the inverse of X. 

To compare predicted and measured In(k) values for all sequence variants 
(Fig. 3a), we calculated predicted values using equation (15), plotted these versus 
the corresponding measured value, and determined the correlation between mea- 
sured and calculated data set. The coefficient of correlation (R?) was computed 
according to*’: 


2 
S error 


= 17 
S total ( ) 


S srror is the sum of squared errors and measures the error, or unexplained variance, 
in the regression. The error is the distance from each point to the regression line, 
and is calculated for each data point, squared, and summed, according to: 


Srioe -_ F i —fiy 


Srotat is the sample variance, which is calculated according to: 


Sioa = >, 9-W 


R=1 


(18) 


(19) 


Position weight matrix. To examine how the position of different nucleobases in 
the binding site affects the reaction rate, we included the position of each base in 
the regression model. This model has 18 explanatory variables, three for each of 
the six positions, as explained above. Each variable is 1 if the respective base is at a 
given position, and 0 otherwise. For example, G4 will be 1 if the fourth base is a G, 
and 0 otherwise. We used the reference sequence as baseline, and did therefore not 
include these variables in the regression. Calculations of linear coefficients 
(Extended Data Fig. 7b), and the comparison between predicted and measured 
values for individual In(k"") values (Fig. 3a) were performed as described above. 
Functional couplings between bases. We created interaction variables of the 
same form as in the two-dimensional model 2, but these interaction terms were 
composed of two bases. For example, A1G4 is 1 if the first base is A and the fourth 
base is G, and will be 0 otherwise. We then performed calculations with the two- 
dimensional model 240 times, each time adding a separate interaction term. Inter- 
actions whose effect was statistically significant (P< 10°) were retained, other 
interactions were not considered further. Next, we included all statistically signifi- 
cant interactions in one model, which was further pared using stepwise regression”*. 
This approach yielded a model similar to the position weight matrix augmented 
with the 44 most significant interaction terms. Calculations of linear coefficients 
(Fig. 3b), and the comparison between predicted and measured values for indi- 
vidual In(k'“!) values (Fig. 3a) were performed as described above. 

Neural network analysis. Analysis was performed with the MATLAB Neural 
Networks Toolbox (v. 3.0). Data input was identical to the two-dimensional model 
above. Data were fit to a three-layer feed-forward network with 13 hidden nodes. 
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Interaction terms are not necessary in this model, because the neural network 
learns the interaction patterns from the raw sequence data. The resulting network 
was used to generate estimates for the reaction rate for each base sequence. Neural 
nets were trained on 60% of the data, validated on 20% of the data and tested on the 
remaining 20%. Almost identical R? values were obtained for both the 20% hold- 
out sample, and the entire data set. 
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Extended Data Figure 1 | C5 binding site in the 87 ptRNA leaders in E. coli. 
a-c, Alignment and sequence logos for the C5 binding site in all 87 ptRNA 

leaders encoded by E. coli. Binding of C5 to the consecutive ptRNA positions 
—3 to —8 is well established, based on a crystal structure’ and biochemical 

evidence"; that is, looping of bases seen for certain RNA- and DNA-binding 
proteins, does not occur with C5. Consistent with this idea, we did not detect 
any sequence motif with the MEME software, when including positions —1 to 


1 CCC 1 
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-8 -7 -6 -5 -4 -3 
Position 


Bits 
© 
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-8 -7 -6 -5 -4 -3 
Position 


—10.a, Sequence alignment. Sequences were aligned with CLUSTAL. Coloured 
squares indicate the bases (C, blue; A, green; U, red; G, black). Anticodon, the 
anticodon recognized by the tRNA; tRNA#, the tRNA identification number; 
tRNA type, the amino acid. b, Sequence logo depicting the probability of 
any base at a given position, based on the alignment in a. The logo was 
generated with Weblogo. c, Sequence logo for the information content of the 
alignment in a. The logo was generated with Weblogo. 
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Extended Data Figure 2 | Preparation of DNA libraries for lumina 
sequencing. a, BAR, the indexing barcode; NN, the degenerated barcode. For 
primer sequences see Methods. RT, reverse transcription. b, DNA libraries 
(PCR products, a) for samples at the time points indicated. Controls: lane 5, no 


30 ra he 
| | 


NNBARGGGAGACCGGAATTCAGATTGATGNNNNNNATGG 3’ 


RNA; lane 6, no reverse transcriptase. c, Read structure. Nucleotides 1 and 2 are 
degenerated barcode; nucleotides 3-5 are sample barcode (index tag); 
nucleotides 6-29 are additional leader sequence, nucleotides 30-35 are 
randomized leader sequence; nucleotides 38 onwards are tRNA. 
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Extended Data Figure 3 | Multiple turnover reaction scheme. E, enzyme; 
ES, _;, individual enzyme substrate complexes; K,_ ;, individual functional 
binding constants; S;___;, individual substrate variants; V;__;, individual reaction 
rate constants. 
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Extended Data Figure 4 | Effect of the 21 nucleotide extension on ptRNA 
processing by RNase P. a, Relative processing rate constants were measured 
for three sequence variants from different parts of the affinity distribution by 
PAGE. Reactions for each sequence variant were conducted in the presence of 
the randomized population (unlabelled) with equal amounts of substrate with 
(S/21) and without the 21-nucleotide extension (S/nL). The asterisk marks the 
position of the radiolabel at the 5’ end of the substrate. Reactions were 
conducted under the conditions described in the Methods. b, PAGE for the 
reaction of the reference sequence variant. The time point at 5 min is marked 
for reference. c, The effects of the 21-nucleotide extension on relative 
processing rate constants of the three indicated sequence variants. The position 
of each sequence variant in the affinity distribution of all sequence variants 
(Fig. 2d) is given for reference by the vertical line above the plot. The number 
indicates the factor (S/nL)/(S/21) by which the 21-nucleotide extension 
decreases the relative rate constant of the given sequence variant, given as 
average from three independent experiments. The horizontal line 
approximates the degree of the relative change. The 21-nucleotide extension 
decreases the observed for sequence variant (CTCCTG) by a factor of 2.3. For 
the genomically encoded leader sequence AAAAAG, the 21-nucleotide 
extension decreases k**! for by a factor of 0.95; that is, the substrate with the 
extension reacts slightly faster than the substrate without extension. The fast 
reacting substrate (TTATAT) is also only minimally affected by the extension 
(0.92). Together, the data show only minor effects of the 21-nucleotide 
extension on the position of a given sequence variant in the affinity distribution. 
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Extended Data Figure 5 | Processing of ptRNA™*©?®) by RNase P without —_ without C5 (black line). Data were obtained analogously to those with C5. For 
C5. Distribution of k* values for processing of ptRNAM*?®) by RNase P comparison, the distribution of k"! values with C5 is shown (red line). 
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Extended Data Figure 6 | Sequence logos are only associated with the 
high-affinity tail of the distribution. a, Plot of sequence variants ranked from 
weakest to tightest binder to the specific transcription factor Arid3a (Fig. 2d), 
based on data published previously’*. To facilitate direct comparison to the 
six-nucleotide binding site of C5, only approximately half of all sequences are 
shown in the plot, and only six positions (positions two to seven, as indicated) 
of the eight-nucleotide binding site are shown. The position in the binding site 
is marked on the right. The brackets mark 0.1% of sequence variants 

(33 sequences) that bind tightest, fall into the medium, and bind weakest. 
Sequence logos show the information content in these sequences. The logos 
were generated with Weblogo. Sequence signatures of the tightest binding 


variants are highly enriched in physiological substrates of Arid3a’*. b, Plot of 
sequence variants ranked from weakest to tightest binder to another specific 
transcription factor, Hnf4a, based on data published previously”*. 
Approximately half of all sequences are shown in the plot, and six positions 
(positions two to seven, as indicated) of the eight-nucleotide binding site. 
Sequence signatures of the tightest binding variants are highly enriched in 
physiological substrates of Hnf4A"*. c, Plot of sequence variants ranked from 
slowest to fastest reacting for C5 (Fig. 2e). The brackets mark 1% of sequence 
variants that react fastest, fall into the medium and react slowest. Sequence 
logos were generated as in a. 
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Extended Data Figure 7 | Sequence determinants for substrate recognition 
by C5. a, Model considering identity, but not position of a given base in the 
C5 binding site. Ranking of the four bases according to their potential to 
promote (positive linear coefficient) or decrease (negative linear coefficient) 
functional C5 binding. For calculation of linear coefficients, see the Methods. 
b, Position weight matrix (PWM) model considering both base identity and 


LETTER 


Fast 


QO Cr 


ir non 


dsm) 


fff pid nfm) 


b> [1 ls]: 1e@elfilefiti] iti] 
ee TOO JOO ooo 


Position 


position in the binding site, but assuming independent contributions of each 
position. The plot shows the ranking of the bases according to their potential to 
promote (positive linear coefficient) or decrease (negative linear coefficient) 
functional C5 binding, relative to the reference sequence (AAAAAG, Fig. 1c). 
Bases are coloured as in a. For the calculation of linear coefficients, see 

the Methods. 
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Extended Data Figure 8 | Neural network analysis. Correlation between 
observed k** and values calculated with the best model obtained by neural 
network analysis (Methods). 
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Extended Data Table 1 | Sequencing data. 


Replicate 1 


Timepoint 0 T 1 T5 T 40 
min:sec 0:00 1:19 5:28 40:00 
Fraction ptRNA processed * 6) 0.14 0.28 0.55 
Total sequence reads (number 2,828,358 4,212,150 4,603,710 4,882,095 
Reads passed quality threshold t 2,646,624 3,849,190 4,391,105 4,676,773 
Fract. reads below quality threshold 0.064 0.086 0.0462 0.042 
Replicate 2 
Timepoint 0 T0.5 T1 T5 
(min:sec) 0:00 0:37 1:15 5:15 
Fraction ptRNA processed * 0 0.08 0.12 0.28 
Total sequence reads (number) 6,434,248 8,172,493 11,054,769 11,604,173 
Reads passed quality threshold t 5,800,933 7,476,616 10,341,132 10,421,304 
Fract. reads below quality threshold 0.098 0.085 0.064 0.101 


The overall number of reads obtained for the respective time points in the independent replicate experiments 1 and 2, and the number of reads that passed the quality control. *The fraction of processed ptRNA was 
determined from PAGE data (Fig. le and Methods). +Read numbers after filtering for potential PCR artefacts for each sequence variant (Methods). 
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Migrating bubble during break-induced replication 
drives conservative DNA synthesis 


Natalie Saini'*, Sreejith Ramakrishnan**, Rajula Elango*, Sandeep Ayyar’, Yu Zhang’, Angela Deem?+, Grzegorz Ira’, 


James E. Haber’, Kirill S. Lobachev' & Anna Malkova”? 


The repair of chromosomal double strand breaks (DSBs) is crucial 
for the maintenance of genomic integrity. However, the repair of 
DSBs can also destabilize the genome by causing mutations and 
chromosomal rearrangements, the driving forces for carcinogenesis 
and hereditary diseases. Break-induced replication (BIR) is one of 
the DSB repair pathways that is highly prone to genetic instability’. 
BIR proceeds by invasion of one broken end into a homologous 
DNA sequence followed by replication that can copy hundreds of 
kilobases of DNA from a donor molecule all the way through its 
telomere*”. The resulting repaired chromosome comes at a great cost 
to the cell, as BIR promotes mutagenesis, loss of heterozygosity, trans- 
locations, and copy number variations, all hallmarks of carcinogenesis*?. 
BIR uses most known replication proteins to copy large portions of 
DNA, similar to S-phase replication’®"’. It has therefore been sug- 
gested that BIR proceeds by semiconservative replication; however, 
the model of a bona fide, stable replication fork contradicts the 
known instabilities associated with BIR such as a 1,000-fold increase 
in mutation rate compared to normal replication’. Here we demon- 
strate that in budding yeast the mechanism of replication during 
BIR is significantly different from S-phase replication, as it proceeds 
via an unusual bubble-like replication fork that results in conser- 
vative inheritance of the new genetic material. We provide evidence 
that this atypical mode of DNA replication, dependent on Pif! heli- 
case, is responsible for the marked increase in BIR-associated muta- 
tions. We propose that the BIR mode of synthesis presents a powerful 
mechanism that can initiate bursts of genetic instability in eukar- 
yotes, including humans. 

In theory, BIR may constitute a unidirectional, bona fide repli- 
cation fork producing two semiconservatively replicated molecules*" 
(Fig. 1A, a). Alternatively, a D-loop (displacement loop) formed by 
invasion of the broken chromosome may persist throughout BIR, 
migrating down the length of the chromosome, creating an unusual 
condition of conservative inheritance of newly synthesized DNA*** 
(Fig. LA, b-d). 

To distinguish between these models, we used a disomic yeast sys- 
tem (Fig. 1B, a) containing a second, truncated copy of chromosome 
III, cleaved by HO endonuclease under control ofa galactose-inducible 
promoter’. The HO-induced DSB possesses only one efficiently repair- 
able end that invades the second copy of chromosome III, and initiates 
BIR that copies over 100 kilobases (kb) of the distal part of the chro- 
mosome. Using this system, we recently demonstrated that BIR stimu- 
lates mutagenesis along the path of DNA synthesis at a series of lys2 
frameshift reporters’. Here we examined these Lys* mutations to deter- 
mine whether errors during BIR were acquired semiconservatively 
(inherited by either the donor or recipient molecule; Fig. 1B, b) or 
conservatively (inherited only by the recipient molecule; Fig. 1B, c). 
Pulse-field gel electrophoresis (PFGE) was used to separate donor and 
recipient molecules from Lys” BIR outcomes resulting from mutations 


in a lys2 reporter located 16 or 36 kb distal to the site of BIR initiation 
(Fig. 2a, b). Sequencing of the polymerase chain reaction (PCR) products 
derived from the separated chromosomes revealed that the great majo- 
rity of heterozygous frameshift mutations (58 of 58 and 68 of 77 from 
strains with reporters at 16 and 36 kb, respectively) were inherited by the 
recipient molecule, whereas the donor sequence remained unchanged 
(see also Supplementary Discussion). Overall, the mutation pattern sup- 
ports a conservative replication mechanism for BIR. However, because 
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Figure 1 | The mode of DNA synthesis during BIR. A, The models of BIR. 
a, Replication fork proceeds semiconservatively. b-d, Migrating bubble leads to 
conservative inheritance of new DNA. Synchronous (b) and asynchronous 
(c, d) synthesis of leading and lagging DNA strands. B, a, The BIR frameshift 
mutation assay. A DSB is induced at MATa of the recipient chromosome III. 
lys2 reporter is inserted in the donor chromosome 16 or 36 kb telomere- 
proximal from MAT«-inc. Lys* mutations would be inherited equally by the 
donor (D) or recipient (R) if BIR is semiconservative (b), but only by recipient if 
BIR is conservative (c). 
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Figure 2 | BIR-induced mutations. a, The sequencing of the separated donor 
and recipient chromosomes of heterozygous Lys* mutants. b, The effect of 
piflA on BIR-induced frameshifts. Medians of mutation rates are shown. The 
arrows represent a reduction as compared to wild type (WT). ¢, The assay to 
study BIR-induced base substitutions in ura3-29 reporter. d, Depending on 
orientation, the selectable position of ura3-29 leading strand includes cytosine 
(C) or guanine (G). e, MMS amplifies BIR-induced base substitutions in an 
orientation-dependent way. The arrows indicate an increase as compared to 
no-MMS control. See Extended Data Tables 1 and 2 for the details of sample 
size, number of experiments, statistical analysis and for the ranges of medians 
shown in e and b. 


this conclusion was based on analysis of selected mutation events, we 
developed a non-selective test to analyse BIR microscopically by DNA 
combing. 

The experiments were conducted in nocodazole-arrested cells of 
disomic BIR strain bearing a cassette facilitating BrdU incorporation 
in yeast’* (Fig. 3a, b). BrdU was added 3.5 h after DSB induction. After 
completion of BIR, PFGE-separated donor and recipient molecules 
(Fig. 3c and Extended Data Fig. 1) were analysed by molecular comb- 
ing and fluorescent in situ hybridization (FISH). We used an anti- 
BrdU antibody, the P1 probe specific to the tandem repeat of TEF1/ 
BSD inserted 14kb centromere-proximal to MAT in the donor chro- 
mosome, the P2 probe specific to the 20-kb region of chromosome III 
where invasion occurs, and the P3 probe specific to the 15-kb region 
near the telomere (Fig. 3a) to characterize BIR. We observed BrdU 
tracts approximately 100 kb in length in 70 of the 98 repaired recipient 
molecules analysed (Fig. 3d, e and Extended Data Fig. 2a). These tracts 
include the entire chromosome region marked by P2 and P3 and, 
therefore, represent BIR that copied the donor chromosome through 
to its telomere. Additionally, 14% of recipient molecules contained 
long (>30 kb) BrdU tracts that overlapped with P2 but not with P3 
(Fig. 3d and Extended Data Fig. 3a). These molecules probably repres- 
ent repair events where BIR was interrupted, resulting in half-crossover 
formation” (see also Supplementary Discussion). 

Our analysis of donor molecules supports a conservative mode of 
DNA replication during BIR, as only 4 out of 103 donor molecules 
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were illuminated by >30 kb BrdU tracts (Fig. 3d, e and Extended Data 
Fig. 2a). These data confirm a strong bias (P < 0.0001) towards BrdU 
tracts present only in the recipient chromosome. The four cases of 
BrdU incorporation in the donor could result from rare semiconser- 
vative synthesis or from BIR initiated >30 kb proximal to the DSB site, 
which would result in a donor-like size and hybridization pattern due 
to copying of regions unique to the donor molecule’®. On the basis of 
these data, we estimate that, even if semiconservative synthesis occurs, 
it can account for no more than 8% of the BIR events that we analysed 
(see Supplementary Discussion and Extended Data Fig. 4 for the 
results of another series of experiments supporting this conclusion). 
The unusual mode of replication prompted us to characterize the 
structure of BIR molecular intermediates at LYS2 inserted ~16kb 
from the point of strand invasion. Genomic DNA extracted from 
nocodazole-arrested cells undergoing BIR was digested with PstI and 
analysed by two-dimensional (2D) gel electrophoresis using a LYS2- 
specific probe (Fig. 4a, top panel). We detected bubble-like structures 
between 3 and 7h after DSB induction (Fig. 4b-d), but not at 13h, 
consistent with the timing of BIR progression’ (Extended Data Fig. 5). 
All bubble-like intermediates were markedly different from the Y 
structures indicative of S-phase replication forks observed before addi- 
tion of nocodazole and induction of the break (Fig. 4c, 0h). Furthermore, 
no bubble-like structures were observed in control strains in which HO 
endonuclease cannot initiate a DSB (Fig. 4d, no-cut), thus linking 
these structures to BIR exclusively. The bubble-like structures observed 
in BIR were reminiscent of bubbles routinely detected at replication 
origins'’, with one important difference: the BIR bubbles included a 
long, high-molecular-mass tail that extended well beyond the size 
expected for complete replication (arrows in Fig. 4c, d). We proposed 
that initiation of BIR lagging-strand synthesis is often delayed com- 
pared to leading strand, resulting in accumulation of single-stranded 
DNA (ssDNA) behind the BIR bubble, which makes the region around 
LYS2 refractory to PstI digestion. Indeed, pre-incubation of genomic 
DNA with oligonucleotides (PstO3 and PstO4; Fig. 4a, middle and 
bottom panels) complimentary to the Watson strand of two Pstl sites 
flanking the LYS2 gene eliminated the tail and resulted in a second arc 
that probably corresponds to molecular intermediates with bubbles 
consisting of one double-stranded branch (leading-strand synthesis) 
and one single-stranded branch (lagging-strand synthesis) (Fig. 4a, b, d 
and Extended Data Fig. 6). Similar results were also obtained using 
BglII digestion (Extended Data Fig. 7). Notably, whereas simultaneous 
addition of oligonucleotides BglO3 and BglO4, complimentary to the 
Watson strand of two Bglll sites, eliminated the ssDNA tail, the addi- 
tion of each of these oligonucleotides individually failed to eliminate 
the tail. This confirms that two types of DNA intermediates contribute 
to the observed ssDNA tail: those containing ssDNA centromere proxi- 
mal to LYS2 and those with ssDNA distal to LYS2 (Fig. 4a and Extended 
to the Crick strand did not have any effect (data not shown). Bubble 
migration intermediates were also detected with an HPH-specific probe 
that hybridizes to the end of the donor chromosome (Fig. 4a, e). These 
data strongly support a migrating D-loop type of DNA replication’*”. 
We proposed that ssDNA accumulated behind the migrating BIR 
bubble is the cause of BIR-associated mutagenesis because of the pro- 
pensity of ssDNA to accumulate unrepaired DNA lesions”. This was 
tested by using methyl methanesulphonate (MMS), a DNA damaging 
agent that predominantly creates mutagenic lesions in cytosines of 
ssDNA’!”’. In addition, a ura3-29 reporter’, which can revert to 
Ura” via three different base substitutions at one CeG pair (Fig. 2c), 
was inserted in the donor chromosome in two different orientations 
(Oril and Ori2). We expected that MMS will specifically elevate the 
level of BIR-associated mutagenesis in Ori2, where cytosine is located 
in the mutant position of the leading (ssDNA) strand, but not in Oril, 
which contains guanine instead (Fig. 2d). Indeed, we observed that 
even though BIR markedly stimulated base substitutions in ura3-29 
irrespective of its orientations, the effect of MMS was orientation 
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Figure 3 | DNA synthesis during BIR is conservative. a, Experimental 
system to assay BIR using dynamic molecular combing including the 
position of hybridization probes P1, P2 and P3. b, Left: BrdU incorporation 
in the recipient is expected from conservative BIR (red). Right: formation of 
half-crossovers in pifl1d leads to short patches of BrdU in the recipient. 
c, Donor and recipient chromosomes separated using PFGE. d, The 


dependent (Fig. 2e and Extended Data Table 1). Specifically, MMS highly 
amplified BIR-induced mutagenesis in cells containing ura3-29 in Ori2, 
whereas its effect on BIR mutagenesis in Orilwas relatively modest. 
This observation supports the conjecture that ssDNA accumulated 
behind the BIR bubble is the cause of BIR-associated mutagenesis. 
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Figure 4 | Molecular intermediates of BIR. a, D-loop migration during 
coordinated (top) and uncoordinated (bottom and middle) leading- and 
lagging-strand synthesis. b. Schematic of 2D gel with BIR bubbles forming 
an arc (1, 2) with an extension (3) representing ssDNA tail. Annealing with 
PstO3 and PstO4 allows PstI digestion changing the mobility of the 
intermediate (red, 2'). c, 2D analysis of Y-arc during normal replication 


summary of molecular combing analysis. e, The donors and recipients of 
wild-type (PIF1) and pif14. Each molecule was hybridized with P1, P2 and 
P3 probes (green tracts) and treated with anti-BrdU antibodies (red tracts). 
See Extended Data Figs 2a and 3 for the details of analysis, sample sizes and 
images of combing analysis. See also Extended Data Fig. 4 for additional 
experiments supporting the conclusions. 


Additionally, the spectrum of BIR-induced mutations was also orien- 
tation dependent, supporting our conclusion (Extended Data Fig. 2b). 

Because the Pifl helicase is a key component of the BIR machinery™* 
(see also accompanying paper~’), we proposed that Pif1 is essential for 
long-range BIR. We observed that even though BIR-sized products 
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(Oh) and bubble-like structures at time points after BIR induction 
hybridized to LYS2-specific probe. Similar bubble structures were observed 
in other nine independent experiments. d, High-molecular-mass tails 
(arrows) disappear after annealing with PstO3 and PstO4. The arc is absent 
in no-cut controls. e, BIR intermediates highlighted with HPH-specific 
probe. 
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were formed in pifl14 mutant (Extended Data Fig. la, b), no extended 
BrdU tracts were observed in either the donor or recipient chromo- 
somes (Fig. 3d, e). In addition, approximately 22% of recipient mole- 
cules contained short (<20kb) BrdU patches that co-localized with 
probe P2 (Fig. 3d, e and Extended Data Fig. 2a) and probably repre- 
sented DNA synthesis that was prematurely terminated. Therefore, it 
is likely that most outcomes in pif14 mutants formed during the time 
frame of these experiments were half-crossovers (Extended Data Fig. 
1c), supporting our hypothesis that Pifl is required for BIR-associated 
DNA synthesis. The low amount of BIR precluded 2D analysis of BIR 
intermediates in pif14. We investigated whether Pifl may be necessary 
for BIR-induced frameshift mutations. Notably, we observed that all 
BIR-induced frameshift mutations were eliminated in the piflA 
mutant at the 36-kb position, and there was a 20-fold reduction in 
frameshift mutations at the 16-kb position (Fig. 2b and Extended Data 
Table 2). Thus, whereas BIR may initiate in the absence of Pifl, these 
data support that Pifl is required for long-range synthesis during BIR. 
Therefore, Pifl can be added to the list of other previously identified 
proteins, including Pold, Pol¢, Msh2, Mlh1, Dun] and others that are 
involved in BIR and associated mutagenesis*”””». 

Overall, the results of this study demonstrate that BIR is carried out 
by a migrating replication bubble driven by Pifl with asynchronous 
synthesis of leading and lagging strands resulting in a mutation-prone 
accumulation of ssDNA, and leads to conservative inheritance of the 
new genetic material. The bubble migration mechanism and assoc- 
iated mutagenesis may be relevant to cellular processes where BIR has 
been implicated, such as alternative telomere lengthening and mito- 
chondrial DNA maintenance’”**°, where Pifl has an important role. 
An intriguing possibility is that the burst of mutations recently linked 
to replication stress/fork collapse in pre-cancerous cells*' may be 
linked to conservative synthesis initiated by BIR. 


METHODS SUMMARY 


All yeast strains listed in Extended Data Table 3 were isogenic to AM1003 (ref. 2). 
The BrdU cassette was a gift from O. Aparicio and was inserted into chromosome 
Vv". The pTEF1/BSD-snt1 plasmid was integrated at SNT1 of the MATw-inc 
containing copy of chromosome III. DSBs were initiated by HO induction by 
addition of galactose’. Molecular intermediates and products of BIR were analysed 
using cells collected from time-course experiments. 2D gel electrophoresis was 
used to determine the structure of molecular intermediates. BIR outcomes were 
analysed using dynamic molecular combing and fluorescent in situ hybridization. 
BIR-induced Lys* mutations were analysed by sequencing following separation of 
chromosomes by PFGE. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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Media, strains and plasmids. All yeast strains (Extended Data Table 3) were 
isogenic to AM1003 (ref. 2), which is a chromosome III disome with the following 
genotype: hmlA::ADE1/hmlA::ADE3 MATa-LEU2-tel/MATa-inc hmrd::HPH 
FS2A::NAT/FS2 leu2/leu2-3,112 thr4 ura3-52 ade3::GAL::HO adel met13. 

AM1291 and AM1482 are derivatives of AM1003 and were created by deleting 
LYS2 from its native location, and inserting lys2-Ins A,(A,4) at different positions of 
chromosome III’. AM2191 and AM 2198 were constructed from AM1291 and 
AM1482 by replacement of PIFI with KANMX module”. Control strains 
AM1449, AM1649, AM2247 and AM2257, which contained no HO cut site in 
the recipient chromosome III, were obtained from AM1291, AM1482, AM2191 
and AM2198 as previously described’. AM2439 and AM2438 were created by 
integrating three and two copies of TEF1/BSD-snt1 into SNT1 of AM1291 and 
AM1482, respectively. The TEF1/BSD-snt1 plasmid was constructed by cloning of 
a PCR-amplified 1-kb region of SNT1 (from 185626 to 186589 positions of chro- 
mosome III) into the BamHI/HindIII fragment of TEF1/BSD (Invitrogen). The 
resulting plasmid was linearized by SnaBI and integrated at SNT1 to introduce a 
donor-specific region into the MATa-inc containing copy of chromosome III. The 
selection of transformants with integration of multiple copies of the plasmid was 
achieved by PFGE followed by Southern hybridization with TEF1/BSD used as a 
probe. AM2118 was isogenic to AM1247 (ref. 9), but contained KANMX module 
at chromosome II between PTC4 and TPS1. 

AM2110 is a derivative of AM1003, and was created by deleting URA3 (using 
delitto perfetto approach) and replacing hmr::HPH with hmr::KANMx. In addi- 
tion, it contains lys2-InsA,4 inserted at SED4 (36 kb centromere distal to MATa- 
inc). AM2161 and AM2820 were derivatives of AM2110 where ura3-29-HPH 
fragments (Oril and Ori2 respectively) were inserted 16 kb centromere distal to 
MATz«-inc between RSC6 and THR4. The ura3-29-HPH cassettes containing ura3- 
29 allele” in two orientations were a gift from Y. Pavlov. The insertion of ura3-29- 
HPH 16kb centromere distal to MATx-inc was achieved by transformation of 
AM2110 with DNA fragments generated by PCR amplification of ura3-29-HPH 
using the following primers with targeting tails (uppercase) and ura3-29-HPH 
amplification sequence (lower case): 5’-TCTTTCTGCAATTATTGCACGCCTC 
CTCGTGAGTAGTGACCGTGCGAACAAAAGAGTCATTACAACGAGGAAA 
TAGAAGAagtcagtgagcgaggaage-3’ and 5'-ATATTTGCTGCTATACTACCAAA 
TGGAAAAATATAAGATACACAATATAGATAGTATTAAAAAAACGTGTAT 
ACGTTATTattgtactgagagtgcacc-3’. Control strains AM2442, AM2259 and AM2842, 
which contained no HO cut site in the recipient chromosome III, were obtained 
from AM2118, AM2161 and AM2820 as previously described’. 

AM2406 is a derivative of AM1003 that was constructed by inserting BrdU 
cassette (with the human equilibrative nucleoside transporter (hENT1) and the 
herpes simplex virus thymidine kinase)"* into URA3 to facilitate efficient BrdU 
incorporation in yeast. In particular, the p306-BrdU plasmid" was linearized with 
Stul and inserted by transformation into the URA3 gene (chromosome V). In 
addition, AM2406 contained insertion of three tandem arrays of the TEF1/BSD- 
sntl at SNT1, and replacement of TPS1 with a KANMX module. TPS1 was deleted 
to reduce accumulation of trehalose, which interfered with DNA purification. 

Rich medium (yeast extract-peptone-dextrose (YEPD)) and synthetic complete 
medium, with bases and amino acids omitted as specified, were made as described*. 
YEP-raffinose, YEP-lactate and YEP-galactose were made as described”**. Cultures 
were grown at 30 °C. 

Analysis of BIR efficiency. DSBs were initiated by HO induction by addition of 
galactose’. BIR efficiency was determined genetically and by physical analysis in 
time-course experiments using PFGE as previously described’. The average effi- 
ciency of BIR at each time point was calculated based on results of four independent 
experiments. 

2D analysis of molecular intermediates of BIR. Cells were grown overnight in 
synthetic leucine drop-out media, transferred to YEP-raffinose, and incubated for 
~16h, until cell density reached ~1 X 107 cellsml~'. An aliquot was taken for 
analysis of the S-phase replication fork, and galactose was added to a final con- 
centration of ~2% to induce HO endonuclease in the remainder of the culture. In 
these experiments, the efficiency of BIR was 80 + 15%, as determined by PFGE 
analysis” 10 h after DSB (Extended Data Fig. 5c). DSB induction led to G2/M arrest 
~3h after galactose addition as cells were in the process of completing BIR repair 
(Extended Data Fig. 5d). At this point, nocodazole was added to the culture to a 
final concentration of 0.015 mg ml’ to maintain the arrest. Cells were collected at 
different intervals following the break and subjected to psoralen crosslinking that 
allowed one to constrain branch migration during DNA purification as previously 
described**. Chromosomal DNA was extracted and neutral/neutral 2D analysis 
was carried out according to ref. 35. PstI-digested DNA was separated in the first 
dimension on a 0.4% gel without ethidium bromide in 1X TBE buffer at 1 Vcm™* 
for 22 h. The second dimension was run at 6 V cm in 1X TBE buffer containing 
0.3 wg ml’ ethidium bromide for 12h. 
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Alternatively, to guarantee that the observed intermediates do not result from 
mechanical stress during genomic DNA preparation, we conducted 2D-gel elec- 
trophoresis using chromosomal DNA embedded in agarose plugs. In particular, 
cells collected at different intervals after induction of BIR were treated with psor- 
alen as described previously™*’. The cells were then re-suspended in 750 kl solution 
of 1 M sorbitol, 0.5 M EDTA (pH 8) and treated with 0.2 mg ml lyticase for 1 hat 
37°C. The spheroplasts were washed in a solution of 50 mM Tris, 50 mM EDTA 
and 100mM NaCl. The spheroplasts were then embedded in 0.8% low melt 
agarose at a concentration of 1.5 X 10° cells ml. The chromosomal DNA embed- 
ded in agarose was digested with BglII, and 2D-gel electrophoresis was carried out 
as described for the 2D analysis of PstI-digested chromosomal DNA. 

To identify regions of single-stranded DNA, a PstI or BglII digest was preceded 
by pre-incubation of genomic DNA with oligonucleotides that were complimentary 
to the PstI or BglII sites flanking the LYS2 gene and had the following sequences: 
5'-GGTCGCCCTGCAGCACAAGC-3’ (PstO3), 5’-GITCCTTTCCAGATCTTG 
GCAACTTT-3’ (BglO3), 5'-GCTTGTGCTGCAGGGCGACC-3’ (PstO5), 5’-A 
AAGTTGCCAAGATCTGGAAAGGAC-3’ (BglO5), where “O3’ and ‘OS’ indicate 
oligonucleotides that are complimentary to the Watson and Crick strands at the 
centromere-proximal site, respectively; and 5’-TAGATGGCTGCAGAACCAGT-3' 
(PstO4), 5’-TGGATCTGGTAGATCTGTAAACTTGG-3’ (BglO4), 5’'-ACTGGT 
TCTGCAGCCATCTA-3’ (PstO6), 5’-CCAAGTTTACAGATCTACCAGATCCA-3' 
(BglO6), where ‘O4 and ‘O@ indicate oligonucleotides that are complimentary to 
the Watson and Crick strands at the telomere-proximal site, respectively. 

Southern hybridization was performed using LYS2 fragment obtained by PCR 
amplification of a 0.6-kb region of LYS2 (from 471835 to 472443 kb positions of 
chromosome II) or using HPH-hybridizing fragment obtained by PCR amplifica- 
tion of HPH from the pAG32 plasmid”*. 

Along with analysis of BIR intermediates, cell cycle distribution was analysed by 

flow cytometry* and BIR kinetics were analysed by PFGE. For PFGE, chromosome 
plugs were prepared* with genomic DNA embedded in plugs of 1% low-melting 
agarose and separated at 6 V cm | for 40h using the CHEF DRII apparatus. PFGE 
was followed by Southern analysis with an ADE1-specific probe labelled with P*?. 
Images were analysed using a Molecular Dynamics PhosphorImager. 
DNA combing and fluorescent in situ hybridization. Cells were grown over- 
night in synthetic leucine drop-out media, transferred to YEP-lactate, and incu- 
bated for ~20h, until cell density reached ~1 X 10’ cells ml !. Cells were arrested 
by nocodazole added to 0.015mgml~', and DSBs were induced 2.5h later by 
addition of galactose to the final concentration of 2%. When experiments were 
performed according to this protocol, the efficiency of BIR was 54.0 + 9.8%, as 
determined by PFGE analysis” 11 h after DSB induction (Extended Data Fig. 1a, b). 
BrdU was added to the culture 3.5 h after DSB induction by galactose to the final 
concentration of 0.4mg ml‘ after all normal DNA replication was completed but 
before the beginning of BIR. Aliquots were removed to embed cells into agarose 
plugs before and 11h after induction of DSBs with galactose. In experiments 
involving pif1A strains, the analysis was performed 13h after DSB induction 
due to slower kinetics of DSB repair in pif1A (data not shown). The uniform arrest 
of cells at G2/M was confirmed by the absence of BrdU incorporation in any 
chromosomes other than chromosome III, which was assayed by PFGE analysis 
of yeast chromosomes extracted from samples taken before the addition of BrdU 
and 11 or 13h after DSB induction and probing with anti-BrdU antibodies. 

Genomic DNA preparation and molecular combing were performed as described”. 
Colour hybridization of chromosome III molecules was performed using three 
fluorescent probes. P1 probe was prepared using the TEF1/BSD plasmid (Invitrogen) 
and hybridized to the 15-kb region containing three tandem repeats of the TEF1/ 
BSD-snt1 plasmid inserted into the donor copy of chromosome III at position 
186535. P2 probe marked the position close to strand invasion during BIR and was 
comprised of a set of four 5-kb fragments that corresponded to the following 
positions on the donor chromosome III: 200205 to 205140, 205117 to 210385, 
210361 to 215385, and 215361 to 220337. The P3 probe highlights the region close 
to the telomeric end of chromosome III and is made up of three 5-kb fragments 
corresponding to the following positions on the donor chromosome: 274778 to 
279801, 279778 to 284814 and 284791 to 289782. The probes were made by PCR 
amplification of genomic DNA from AM2406. Nucleotide sequences of the pri- 
mers used to generate fragments for labelling are available upon request. Probes 
were labelled with biotin-dUTP. Hybridization and fluorescent detection of 
combed DNA molecules were achieved according to protocols described” with 
a few modifications. Successive layers of fluorophore-conjugated antibodies diluted 
in 1X PBST (1X PBS + 0.05% Tween) were used. For the biotin-conjugated probes, 
the following series was used at a dilution of 1:4,000: (1) Alexa-488-Streptavidin 
(Molecular Probes; Life Technologies, catalogue no. 32354); (2) biotinylated anti- 
streptavidin (from Vector Lab, catalogue no. BA-0500); (3) Alexa-488-streptavidin; 
(4) biotinylated anti-streptavidin; and (5) Alexa-488-Streptavidin. To detect BrdU 
incorporation, the following series were used at the indicated dilutions: (1) 1:20 
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dilution of mouse anti-BrdU (BD Biosciences, catalogue no. 347580); (2) 1:50 
dilution of Cy3-coupled rat anti-mouse (Jackson ImmunoResearch Lab, catalogue 
no. 415-165-166); and (3) 1:50 dilution of Cy3-mouse anti-rat (Jackson Immuno- 
Research Lab, catalogue no. 212-165-168). All images were acquired using the 
Zeiss LSM 510 Confocal Microscope with 100X objective. The lengths of the 
fluorescent stretches were calculated by comparison with the length of P1, P2 
and P3 hybridization signals. 

The statistical comparison between donor and recipient chromosomes in 
respect to BrdU incorporation was performed using the Chi-squared test. For 
each experiment, the frequency of semiconservative BIR (F) was calculated as 
follows: F = A/N X f X b, where A represents the number of donor molecules with 
long BrdU tracts; N represents the total number of analysed donor molecules; f 
represents the efficiency of BIR in the experiment (calculated by physical analysis 
as a percentage of the truncated chromosome III converted in the BIR product’); 
and b represents the fraction of recipient molecules containing full and long 
interrupted BIR tracts. 

Mutagenesis associated with DSB repair. To determine mutation frequency 
associated with BIR, yeast strains were grown from individual colonies with agita- 
tion in liquid synthetic media lacking leucine for approximately 20h, diluted 20- 
fold with fresh YEP-Lac, and grown to logarithmic phase for approximately 16h. 
Next, 20% galactose was added to the culture to a final concentration of 2%, and 
cells were incubated with agitation for 7 h. Samples from each culture were plated 
at appropriate concentrations on adenine drop-out media and on media omitting 
lysine and adenine before (0h) and 7h after the addition of galactose (7h) to 
measure the frequency of Lys* cells. To measure the frequency of Ura* cells, 
samples were plated at appropriate concentrations on adenine drop-out media 
and on media omitting uracil and adenine before (0 h) and 7 h after the addition of 
galactose (7 h). To determine spontaneous mutation frequencies, no-DSB strains 
were grown similarly to the DSB-containing strains. Because spontaneous muta- 
tion frequencies were calculated based on the number of mutations accumulated 
during many cell generations, the rate of spontaneous mutagenesis in no-DSB 
control strains was calculated using the following modification of Drake equation: 
Le = 0.4343 filog(Nu), where jis the rate of spontaneous mutagenesis, fis mutation 
frequency, and Nis the number of cells in yeast culture. The rate of mutations after 
galactose treatment (17) was determined using a simplified version of the Drake 
equation: [7 = (f; — fo), where f; and fy are the mutation frequencies among Ade* 
cells at times 7h and Oh, respectively. This modification was necessary because 
experimental strains did not divide or underwent <1 division between 0 hand 7 h. 

MMS was added at 1.5mM 30 min after galactose addition. Cells were incu- 
bated with agitation for 7h, treated with 10% sodium thiosulphate to inactivate 
MMS, diluted and plated. The loss of viability after MMS treatment was barely 
detectable and never exceeded 40% independently of ura3-29 orientation. The rate 
of mutations following MMS treatment was determined using a simplified version 
of the Drake equation: ju; = (f; — fo), where f; and fo are the mutation frequencies 


among Ade™ cells at times 7h (following MMS treatment) and 0h, respectively. 
This modification was necessary because experimental strains did not divide or 
underwent <1 division between 0h and 7 h in the presence of MMS. 

Rates are reported as the median value (Fig. 2b, e and Extended Data Tables 1 
and 2), and the 95% confidence limits for the median are calculated for the strains 
with a minimum of six individual experiments. For strains with four-five indi- 
vidual experiments, the range of the median was calculated. Statistical compar- 
isons between median mutation rates were performed using the Mann-Whitney 
U-test”®. 

Analysis of BIR-induced Lys* mutants. Lys* revertants were obtained in BIR 
mutagenesis experiments’. After phenotypic examination, cultures were grown 
from mutants for chromosome analysis by PFGE using 1% low-melting agarose at 
6 Vcm ' for 48h. DNA bands corresponding to the donor and repaired recipient 
chromosome III were excised, equilibrated in B-agarase buffer (NEB), melted at 
65 °C, and subjected to B-agarase treatment for 1 h at 40°C. The obtained DNA 
was PCR amplified using LYS2-specific DNA primers’, followed by sequencing 
analysis. 

Analysis of mutation spectra of ura3-29 Ura* reversions. To determine the 
spectrum of Ura” in individual experiments, a portion of the URA3 gene from 
independent Ura’ was PCR-amplified using URA3-specific primers: 5'-GIGTG 
CTTCATTGGATGTTCGTAC-3’ and 5'-AAAAGGCCTCTAGGTTCCTTTGTT-3’ 
followed by sequencing analysis using 5’-CTGGAGTTAGTTGAAGCATTAGG-3’ 
as a primer. 

For experimental strains undergoing BIR repair, 7h Ura’ BIR events (con- 
firmed as Ade*Leu on selective media) were sequenced. Because these cells 
underwent <1 division between the 0h and 7h time points and the Ura* fre- 
quency at 7h significantly exceeded that at 0h, all Ura* events resulting from DSB 
repair were considered independent. 
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Extended Data Figure 1 | BIR efficiency during molecular combing analysis 
of molecular intermediates of BIR. a, BIR efficiency was analysed by PFGE 
from samples used for dynamic molecular combing analysis (Fig. 3d). DNA 
was prepared from cells containing truncated chromosome III (Trunc Chr III) 
before DSB induction and 11h or 13h after DSB induction from wild-type 
(PIF1) and pif1A cells, respectively. In pif1A, a later time point (13h) was 
analysed owing to slower kinetics of DSB repair in pif1A as compared to PIF1. 
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Chromosomes were separated by PFGE followed by Southern hybridization 
with an ADE1-specific probe. b, Quantification of DSB repair efficiency (BIR, 
or other recombination pathways) based on the results of 3-5 individual 
experiments and presented as average ~ s.d. c, Schematic of the BIR assay. 
Interruption of BIR leads to the resolution of BIR intermediates resulting in 
half-crossover formation. 
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Relevant Molecule Full BIR Long BIR Short BrdU * patches No BrdU | Total 
Genotype analyzed (~100 kb BrdU) | (>30kb BrdU) (<20 kb BrdU) 


14 [4] (10) 


*- BrdU patches between P2 and P3 are included. [] -overlaps with P2; ( ) — overlaps with P3 


b 
F F BIR Ura* mutations 
Orientation of MMs 
ane 


7 79 18 2 
(80%) (18%) (2%) 


i ca 
a nc 
a oc 


31 12 
(66%) (25%) fous 
13 24* 
(28%) (51%) ot: %) 
26** 1 
pa (45%) (19%) 


- Cells exposed to 1.5 mMMMS; * and ** - Statistically different from Ori1 (P<0.0001 and P=0.04 respectively) 


Extended Data Figure 2 | Analysis of molecular mechanism and observed in three additional independent experiments. b, Mutation spectra of 
mutagenesis associated with BIR. a, The summary of molecular combing BIR-induced base substitutions in ura3-29 in the presence or absence of 1.5 mM 
analysis presented in Fig. 3 and in Extended Data Fig. 3 is shown. A strong bias | MMS is shown. 

towards BrdU tracts present only in the recipient chromosome was also 
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Extended Data Figure 3 | Molecular outcomes of BIR. a, Left: interrupted 
BrdU tract in recipient may result from half-crossover. Right: an example of 
wild-type (PIF1) recipient with interrupted BrdU tract hybridized to P1, P2, P3 
probes (green) and treated with anti-BrdU antibody (red). b, Left: BIR initiated 
by strand invasion between FS2 (inverted repeat of Ty1 located 30 kb 
centromere proximal to MAT) and P1 results in formation of recipients 
hybridizing to P1, P2, P3 and BrdU. Right: an example of wild-type (PIF 1) 
recipient. Top: hybridization to P1, P2, P3. Middle: treatment with anti-BrdU 
antibody. Bottom: merge. ¢, Left: BrdU incorporation in the recipient resulting 
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from BIR (red) and from filling-in synthesis (pink) following extensive 
resection. Right: an example of wild-type (PIF1) recipient. Top: hybridization 
to P1, P2, P3. Middle: treatment with anti-BrdU antibody. Bottom: merge. 

d, Left: HJ resolution at the end of BIR progression leads to switch from 
conservative to semiconservative BIR resulting in a short patch of BrdU 
overlapping with P3 in the donor. Right: an example of BrdU incorporation in 
the donor from wild-type (PIF1) strain hybridized to P1, P2, P3 and treated 
with anti-BrdU antibody. 
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Extended Data Figure 4 | Conservative DNA synthesis associated with BIR. _ (R) chromosomes extracted after PFGE (b) and hybridization with probes 
Results from a series of three experiments where only P1, P2 and anti-BrdU (green tract) and treatment with anti-BrdU antibodies (red tract) (c). No BrdU 
antibody were used. a, BrdU incorporation in the recipient is expected from tracts are visible in more than 97% of donors. The repaired recipient contains 
conservative BIR (i; red) and from filling-in synthesis (pink) following long stretches of BrdU overlapping with the P2 region. 

extensive resection (ii). b, c, Analysis of the donor (D) and repaired recipient 
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Extended Data Figure 5 | BIR kinetics during 2D analysis of molecular ADE1-specific probe (b). c, BIR efficiency quantified based on the results of four 
intermediates of BIR. a, BIR kinetics was analysed by PFGE from samples individual experiments including the one shown in Fig. 4 presented as 


used to determine the structure of BIR intermediates by 2D electrophoresis average + s.d. d, Flow cytometry of DNA analysis of cells undergoing BIR 
(Fig. 4c, d). DNA was prepared for PFGE at intervals after induction of DSBs at _ repair. 
MATa and separated by PFGE (a) followed by Southern hybridization with an 
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Extended Data Figure 6 | The structure of molecular intermediates of BIR. 
a, The structure of the chromosome III region with LYS2 inserted 16 kb 
centromere distal to MAT«-inc. P1, P2, P3, and so on designate positions of PstI 
sites flanking LYS2. b, The structure of replication bubbles migrating through 
LYS2 (with black rectangle designating LYS2-specific probe). i, Replication 
bubble with synchronous leading and lagging strands (double-stranded). ii, 
Replication bubble with delayed initiation of the lagging strand with respect to 


PstO3 & PstO4 oligos}|PstO3 & PstO4 oligos 


the leading strand (partially single-stranded bubble). iii, A partially single- 
stranded bubble with one or several PstI sites behind the bubble inactivated due 
to accumulation of single-stranded DNA. Red and pink rectangles represent 
oligonucleotides PstO3 and PstO4, respectively. iv, A single-stranded bubble 
that has passed beyond the P3-P4 region. c, Theoretical bubble-migration 
curves for the intermediates shown in b. d, Calculation of parameters of the 
bubble-like structures for the intermediates shown in b. 
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Extended Data Figure 7 | Molecular intermediates of BIR. BIR 

intermediates were analysed by 2D gel electrophoresis of BglII-digested intact 
chromosomal DNA embedded in agarose plugs. a, D-loop migration in 2D gels 
(hybridized to LYS2, black oe during coordinated i) and uncoordinated 


of. replication and BIR intermediates Annealing to oligonucleotides (BglO3 
and BglO4) restores BgllI sites (B) in ssDNA (see a, ii) and changes migration of 


BIR/BglO4 


BIR/BglO3 


the intermediate as shown by 2' (red). ¢, 2D analysis of Y-arc during normal 
replication (0 Hr) and bubble-like structures at time points after BIR induction. 
Similar bubble structures were observed in nine additional independent 
experiments (see the legend to Fig. 4). d, High-molecular-mass tails (arrows) 
disappear after simultaneous addition of BglO3 and BglO4 (BIR/ 
BglO3+BglO4). The addition of each of these oligonucleotides individually 
(BIR/BglO3 or BIR/BglO4) failed to eliminate the tail. 
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Extended Data Table 1 | The rate of spontaneous and DSB-associated Ura* mutations 


Rate of Ura* (x10°)* 
Before galactose (0 h) After galactose (frequency(7 h —0h)) BIR efficiency (%)8 
No MMS 1.5mM MMS No MMS 1.5mM MMS 
Fold above. 
Orientation Clor range? Clor range! Clor range? no-damage 
of ura3-29 HOsite Median [#of repeats] Median [#of repeats] Median [# of repeats] (P-value) Ade* Ade*Ura* Ade* Ade*Ura* 


orit DSB 28 = (10-41) [7] 3,646 (2,305- 4,159) [7] 9,415 (4,911- 11,061) [7] 2.6 (P=0.0006) 80+8 9448 84:10 98:04 
ori2 DSB 6 (5-17)[7] 1,903 (979-2,941) [7] 41,835 (34,830 - 79,488) [7] 22(P=0.0006) 7449 9442 76414 9242 
Orit No 8 (5-10)[17] Oo (0 - 11) [10] 198 (49 - 358) [7] 24.8 NA N/A NIA N/A 


ori2 No 12 (5-19)[13] 0 (0-7) [7] 157 (101 - 245) [7] 13.1 NA NA NA N/A 


* Rates calculated at 0h based on Oh frequencies using the Drake equation (see Methods for details). At 7h, rates were calculated as (7 h frequency — Oh frequency); differences <O are reported as ‘0’. 
+ For strains with =6 experiments, the 95% Cl of the median is given. 


t Statistically significant elevation of 7h mutation rate in strains in the presence of MMS over 7h mutation rate in the absence of MMS. 
§ Per cent of BIR (average + s.d.) calculated based on 3-6 experiments among DSB repair outcomes collected at 7 h on either adenine dropout media (Ade*) or on adenine/uracil dropout media (Ade* Ura*). 
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Extended Data Table 2 | The rate of DSB-associated Lys* mutations (top), and the rate of spontaneous Lys* mutations (bottom) 


Rate of Lys* (x10°)* 


BIR efficiency (%)8§ 
Before galactose(0h) After galactose (frequency (7 h —0 h)) 


Relevant Clor range’ Clor range’ Fold below WT® 

Position Construct HOsite Genotype Median _[# of repeats] Median [# of repeats] (P-value) Ade* Ade*Lys* 
16 kb Ag DSB wt 40 (12.7 - 64.3) [13] 2,690 (1,073.0 - 4,361) [13] NA 77412 99.7 40.5 
16 kb Ag DSB pif1a 6 (4.0 - 10.4) [14] 134.7 (104 -1,580) [14] 20 (0.0001) 73411 9943 
36 kb Ag DSB wt 5.3 (2.7 - 15.2) [8] 1,248.10 (860 - 1,552) [8] NA 80+1 99 +1 
36 kb Ag DSB pif1A 1 (0.5 - 12) [13] 1.4 (0 - 4.7) [13] 892 (0.0003) 9144 100*# 

Rate of Lys* (x10°)* 
Relevant Clor range’ 


Position Construct HOsite Genotype Median 


[# of repeats] 


16 kb Ay 
16 kb Ay 
36 kb Ay 
36 kb Ay 


No wt bl (3.3 - 34) [10] 
No pif1A 5.3 (3.3- 7.5) [6] 
No wt 14 (0.7 - 5.4) [4] 
No piftA 0.9 (0.6 -3.9)[9] 


* Rates calculated at Oh based on Oh frequencies using the Drake equation (see Methods for details). At 7 h, rates were calculated as (7 h frequency — Oh frequency); differences <0 are reported as ‘0’. 

+ For strains with =6 experiments, the 95% Cl of the median is given. For the strains with <6 experiments, the median range is given. 

{Statistically significant decrease of median rate at 7h in pif14 compared to wild type. 

§ Per cent of BIR (average = s.d.) calculated based on 4-8 experiments among DSB repair outcomes collected at 7h on either adenine dropout media (Ade*) or on adenine/lysine dropout media (Ade ‘Lys *). 
#No s.d. could be calculated because of a very low number of Lys* (between 1 and 5) in each experiment. 
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Extended Data Table 3 | Strain list 


vidal Genotype Reference 

name 
AM1003 MATa-LEU2-tel/MATa-inc ade1 met13 ura3 2 

leu2-3,112/leu2 thr4 lys5 hml::ADE1/hml::ADE3 
hmr::HPH ade3::GAL-HO FS2::NAT/FS2 
AM1291 AM1003, but /ys2A thr4::lys2-Ins(A,) 9 
AM1449 AM1291, but MATa-inc-LEU2-tel 9 
AM1482 AM1003, but /ys2A sed4::lys2-Ins(A,) 9 
AM1649 AM1482, but MATa-inc-LEU2-tel 9 
AM2191 AM1291, but pif7::KANMX this study 
AM2247 AM2191, but MATa-inc-LEU2-tel this study 
AM2198 AM1482, but pif7::KANMX this study 
AM2257 AM2198, but MATa-inc-LEU2-tel this study 
AM1247 AM1003, but /ys2A thr4::LYS2 9 
AM2439 AM1291, but snt1::(TEF1/BSD) this study 
AM2438 AM1482, but snt1::(TEF1/BSD)>, this study 
AM2118 AM1247, Chr Il::KANMX this study 
AM2442 AM2118, but MATa-inc-LEU2-tel this study 
AM2406 AM1003, but ura3:: p306-BrdU tps1::KANMX this study 
snt1::(TEF1/BSD) 3 
AM2846 AM2406, but tps7::BLEO pif1::KANMX this study 
AM2110 AM1003, but /ys2A ura3A sed4::lys2-Ins(A4) this study 
hAmr::KANMX 

AM2161 AM2110, but thr4::ura3-29 (Ori1) this study 
AM2259 AM2161, but MATa-inc-LEU2-tel this study 
AM2820 AM2110, but thr4::ura3-29 (Ori2) this study 


AM2842 AM2820, but MATa-inc-LEU2-tel this study 
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Pifl helicase and Polé promote recombination-coupled 
DNA synthesis via bubble migration 


Marenda A. Wilson'*, YoungHo Kwon**, Yuanyuan Xu’, Woo-Hyun Chung’, Peter Chi*, Hengyao Niu’, Ryan Mayle’, 


Xuefeng Chen', Anna Malkova°t, Patrick Sung” & Grzegorz Ira! 


During DNA repair by homologous recombination (HR), DNA 
synthesis copies information from a template DNA molecule. Mul- 
tiple DNA polymerases have been implicated in repair-specific DNA 
synthesis’~*, but it has remained unclear whether a DNA helicase is 
involved in this reaction. A good candidate DNA helicase is Pifl, an 
evolutionarily conserved helicase in Saccharomyces cerevisiae impor- 
tant for break-induced replication (BIR)* as well as HR-dependent 
telomere maintenance in the absence of telomerase* found in 10- 
15% of all cancers®. Pifl has a role in DNA synthesis across hard-to- 
replicate sites”* and in lagging-strand synthesis with polymerase 
5 (Pold)*""’. Here we provide evidence that Pif1 stimulates DNA syn- 
thesis during BIR and crossover recombination. The initial steps of 
BIR occur normally in Pif1-deficient cells, but Polé recruitment and 
DNA synthesis are decreased, resulting in premature resolution of 
DNA intermediates into half-crossovers. Purified Pifl protein strongly 
stimulates Polé-mediated DNA synthesis from a D-loop made by 
the Rad51 recombinase. Notably, Pif1 liberates the newly synthesized 
strand to prevent the accumulation of topological constraint and to 
facilitate extensive DNA synthesis via the establishment of a migrat- 
ing D-loop structure. Our results uncover a novel function of Pifl 
and provide insights into the mechanism of HR. 

To understand how Pif1 promotes BIR, we used an established system 
wherein only one end of a site-specific double stand break (DSB) has 
extensive homology to the donor sequence, so that most cells (>80%) 
use BIR for repair’. After strand invasion, over 100 kilobases of the 
full-length chromosome III donor is copied. Chromosomal markers 
provide a means to determine the frequency of BIR or alternative mecha- 
nisms by growth on selective media (Fig. 1a). BIR was evaluated in pif1- 
m2 cells, wherein the mutant Pifl protein is excluded from the nucleus 
but retains mitochondrial function”, or in pif1A cells. 

Cells lacking Pifl are BIR deficient and have a large increase in half- 
crossover products (Fig. 1b), with pif1A showing a greater impairment, 
probably because the pifl-m2 protein retains residual nuclear activity”. 
The Pifl helicase activity is indispensable for BIR as revealed by testing 
the helicase-dead pif1(K264A) mutant (Fig. 1b). Southern blot analysis 
showed loss of the template chromosome in pif1A consistent with an 
increase in half-crossover products (Fig. 1c and Extended Data Fig. 1a). 
An examination of repair products from individual colonies revealed 
elevated gross chromosomal rearrangements and changes in template 
chromosome size, which probably stemmed from half-crossovers (Exten- 
ded Data Fig. 2a—d). The role of Pifl in BIR is general and highly specific, 
as BIR induced at a different locus is also Pifl-dependent (Extended 
Data Fig. 3a—-e) and elimination of two other 5’-3' helicases, Rrm3 or 
Hes1, does not affect BIR (Extended Data Fig. 1b). The BIR function of 
Pifl is unrelated to its known role in telomerase inhibition, as the 
elimination of telomerase components does not suppress the BIR 
defect of pifl-m2 cells (Extended Data Fig. 1c). 


A similar deficiency in BIR with high levels of half-crossovers was 
observed in pol32A cells, which lack the nonessential subunit of Pold 
(ref. 12) (Extended Data Fig. 1b), implicating Pifl in DNA synthesis. 
Consistent with this deduction, strand invasion occurs normally in 
piflA cells (Extended Data Fig. 4a, b), whereas DNA synthesis moni- 
tored by quantitative polymerase chain reaction (qPCR) is decreased 
(Fig. 1d). In chromatin immunoprecipitation (ChIP) analyses we 
found Pifl enrichment at the DSB and along the template molecule, 
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Figure 1 | Pifl promotes DNA synthesis during BIR. a, Schematic of the BIR 
assay. Products are distinguished by genetic markers. b, Repair outcomes in 
wild-type (WT) and indicated mutant cells. c, Quantification of Southern blot 
band intensities corresponding to the DSB repair products and template 
chromosome in wild-type and pif1A cells. HC, half-crossover. d, Analysis of 
initial DNA synthesis by PCR. e, ChIP analysis of Pifl-13xMyc recruitment at 
the indicated loci. f, DSB recruitment of the indicated polymerases during BIR 
as measured by ChIP. Plotted in ¢, e, f, are the mean values + s.d. n = 3. 
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further implicating a role of Pifl in DNA synthesis (Fig. le). This Pifl 
enrichment is repair-specific, as it is compromised in mutants defi- 
cient in strand invasion or extensive DNA synthesis (Extended Data 
Fig. 4c, d). Because Pifl seems to affect DNA synthesis, we used ChIP 
to examine the initial recruitment of Pold, which is essential for BIR", 
and other polymerases. Notably, only the recruitment of Polo is decreased 
in pifl-m2 cells (Fig. 1f). 

Besides BIR, fully processive Pold is needed for the crossover HR 
pathway and promotes long conversion tracts’>. Pifl is also needed for 
both processes as monitored in ectopic or allelic gene conversion 
assays (Fig. 2a-c and Extended Data Fig. 1d). There is no change in 
cell viability or repair efficiency in pif1A cells, but the crossover fre- 
quency decreases by half, similar to pol32A cells (Fig. 2c). Furthermore, 
the increase in crossover frequency caused by deleting the crossover 
suppressors Mph1 and Srs2 (refs 16, 17) is also dependent on Pifl and 
Pol32 (Fig. 2b, c). Thus, Pifl and Pold are key factors in crossover 
recombination. Indeed, the conditional depletion of Pold but not other 
polymerases almost completely eliminates crossovers while slightly 
reducing non-crossovers (Extended Data Fig. le). 

We performed biochemical reconstitution to examine how Pifl influ- 
ences DNA synthesis in D-loops made by Rad51, RPA and Rad54 
(refs 18, 19). According to a published procedure’, we loaded the poly- 
merase clamp PCNA onto the primer end of the D-loop with RFC, 
then added Polé with Pifl (Fig. 3a). Our Pifl, RFC, PCNA and Pold 
preparations had no detectible nuclease or topoisomerase contamina- 
tion (Extended Data Fig. 5a, b). 

We first made the D-loop with a 5’ **P-labelled 90-mer oligonu- 
cleotide as the invading strand and pBluescript (2,961 base pairs (bp)) 
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Figure 2 | Pif1 is important for crossover recombination. a, Schematic of the 
ectopic recombination assay. b, Southern blot analysis of gene conversion with 
and without crossovers in the indicated strains. CO, crossover; NCO, non- 
crossover; P, parental. c, Quantification of crossover frequency in ectopic 
recombination. d, Repair outcomes in BIR assay in wild-type and the indicated 
mutant cells. e, ChIP analysis of RPA binding during BIR at the indicated loci. 
f, Quantification of BIR product formation upon conditional depletion of Psf2 
or Mcm4. Plotted in c, e, f, are the mean values + s.d. n = 3. 
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Figure 3 | Pifl promotes DNA extension at a D-loop. a, Extension of *’P- 
labelled invading strand. Extension products by Pold with Pifl or pifl(K264A) 
(K/A) (13, 27, 40 nM) were analysed in a native gel (top) or denaturing gel 
(bottom). Representative gels from three independent experiments are shown. 
b, Extension of unlabelled invading strand with [a-**P]dCTP. Products were 
resolved in a native gel and quantified. The mean values + s.d. from three 
independent experiments are plotted. c, Products prepared as in lanes 1, 10 and 
13 of panel a were subject to two-dimensional gel analysis. 


DNA as acceptor (Fig. 3a). DNA synthesis by Pold generated DNA 
species that migrated above the D-loop (Fig. 3a, lanes 2 and 10). 
Notably, the addition of Pifl led to the appearance of DNA species 
that harboured a much larger amount of new synthesis (Fig. 3a, lanes 
3-5, 11-13). With Pifl present, DNA species migrating above the sub- 
strate oligonucleotide but below the D-loop were also observed (indi- 
cated by the asterisk in Fig. 3a). Because Pifl can disrupt the D-loop 
structure (Extended Data Fig. 5c), these DNA species probably stemmed 
from the release of the extended invading strand. 

The helicase activity of Pifl is required for the stimulation of Pold- 
mediated DNA synthesis, as revealed by analysing the pifl(K264A) 
mutant'*”? (Fig. 3a). Likewise, no DNA synthesis occurred if either 
Rad51 or Rad54 was absent or upon the omission of RFC, PCNA, or 
deoxynucleoside triphosphates (Extended Data Fig. 5c). Efficient 
D-loop formation and optimal DNA synthesis require RPA (Extended 
Data Fig. 5c), in concordance with previous observations that it pro- 
motes Rad51-mediated strand exchange’’*' and DNA unwinding by 
Pifl (ref. 22). Notably, in the absence of Pifl, RPA was unable to promote 
extensive synthesis by itself (Fig. 3 and Extended Data Fig. 5c), even 
when present in excess (data not shown). In addition, when either 
Rad51 or Rad54 was removed (Extended Data Fig. 6a), or when Rad54 
was heat-deactivated after D-loop formation (Extended Data Fig. 6b), 
Pifl was still able to stimulate DNA synthesis. 

In another set of experiments, an unlabelled invading strand was 
extended with [a-*"P]dCTP present (Fig. 3b). Two-dimensional gel electro- 
phoresis (Fig. 3c)’”? showed that the extended DNA species made by 
Pold harboured ~200-500 nucleotides (Fig. 3c), whereas the products 
made by Pold with Pifl could reach a few thousand nucleotides (Fig. 3c 
and Extended Data Fig. 5d). 
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The effect of Pifl is highly specific, as neither S. cerevisiae Rrm3 nor 
Escherichia coli DinG, which possess 5'-3' helicase activity, could 
substitute for it. Likewise, no enhancement of Pold-mediated DNA 
synthesis occurred with the 3’—5’ S. cerevisiae Mph1 helicase (Extended 
Data Fig. 7a). Moreover, Pifl has no effect on E. coli DNA polymerase I 
Klenow fragment (Extended Data Fig. 7b). 

S. cerevisiae cells lacking the POL32 gene are impaired for HR'*'*!***. 
The Pol32 protein interacts with PCNA and enhances the processivity 
of Pold (ref. 25). We observed a severely reduced level of DNA synthesis 
by Pold*, which lacks Pol32 (Extended Data Fig. 8a), alone or in con- 
junction with Pifl (Extended Data Fig. 8c). Purified Pol32 interacts 
with Pold* (Extended Data Fig. 8b) and its addition to Pold* led to an 
enhancement of DNA synthesis activity comparable to that of Pold 
(Extended Data Fig. 8c). 

Extensive DNA synthesis in a covalently closed DNA molecule 
generates topological stress that would impede polymerase movement, 
yet, in our reconstituted system several kilobases of DNA can be syn- 
thesized without a topoisomerase (Fig. 3c and Extended Data Fig. 5d). 
In fact, whereas topoisomerase I enhanced DNA synthesis by Pold 
alone (Extended Data Fig. 9a, lanes 5 and 6)°, it had no stimulatory 
effect when Pifl was present (Extended Data Fig. 9a, b). Thus, Pifl- 
dependent DNA synthesis may entail concomitant dissociation of the 
newly synthesized DNA from the 5’ side, a premise supported by our 
observation that Pifl can efficiently dissociate the unmodified (Extended 
Data Fig. 5c) and extended D-loops (Fig. 3a). 

We tested the hypothesis that Pold-Pifl-mediated DNA synthesis 
occurs within a migrating D-loop. First, the extended D-loops were 
analysed by restriction digests (Fig. 4a). If the extended invading strand 
were released, then a significant fraction of the D-loop would be resis- 
tant to the enzymes AhdI and XmunlI, which incise at 115 and 714 
nucleotides from the 5’ terminus of the invading strand, respectively. 
D-loops were made with a 5’ *’P-labelled invading strand and extended 
with unlabelled deoxynucleotides (see Fig. 3a), followed by treatment 
with AhdI or XmnlI and analysis in a denaturing gel (Fig. 4a). The 
D-loop extended by Pold alone could be cleaved quantitatively by 
AhdI to produce a 115-nucleotide DNA fragment (lanes 7 and 9), 
whereas most of the extended product made with Pold-Pif1 was resistant. 
Little of the Pold-extended D-loop was susceptible to XmnI, consistent 
with the short Pold-alone synthesis tract (Fig. 3). A small fraction of the 
extended D-loop from the Polé-Pifl reaction was cleavable by XmnI to 
generate the 714-nucleotide fragment, indicative of DNA synthesis 
proceeding beyond the +714 site and of the fact that much the 
+714 site in the extended DNA existed as single-stranded DNA 
(ssDNA) (Fig. 4a, lanes 13 and 15). Thus, in the Pold-Pifl reaction, 
Pifl continually dissociates the extended strand from the D-loop. This 
‘bubble migration’ mode of DNA synthesis was previously suggested 
for a reconstituted bacteriophage T4 system that harbours the Dda 
helicase”*. 

We next examined the extended D-loops by electron microscopy. 
We treated the D-loop products with the T4 gp32 protein to decorate 
the ssDNA region, followed by protein-DNA crosslinking with glu- 
taraldehyde. The crosslinked nucleoprotein complexes were analysed 
by electron microscopy with metal shadowing”. Figure 4b shows typical 
electron microscopy images of pBluescript DNA, unextended D-loop, 
D-loops extended by Poldé, and D-loops extended by Pold-Pifl. This 
analysis clearly showed D-loop enlargement by Pold and that the Polé- 
extended invading strand remains hybridized to its complementary 
strand. Notably, the inclusion of Pifl generated a long ssDNA tail 
protruding from the D-loop (Fig. 4b). Furthermore, we tested Pifl 
for interaction with Polé and PCNA in affinity pull-down reactions. 
The analysis revealed that Pifl physically associates with PCNA but not 
with Pold (Extended Data Fig. 8d). 

We next asked whether BIR in cells entails the formation of a 
canonical replication fork with the replicative helicase Mcm2-7 having 
a crucial role. Several observations suggest that this is not the case. 
First, mutants deficient in structure-specific resolvases show only a 
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Figure 4 | Evidence for DNA synthesis in a migrating D-loop. 

a, Biochemical analysis. Products were analysed in a denaturing gel. A 
representative gel from three independent experiments is shown. RE, 
restriction enzyme. b, Electron microscopy analysis. Micrographs of plasmid 
(pBluescript) DNA, D-loop made by Rad51, and extension products by Pold 
and Pold-Pifl. ssDNA was decorated with T4 gp32 and appears thicker than 
duplex DNA. Arrows indentify the D-loop. Scale bar, 100 nm. Representative 
micrographs from two experiments are shown. c, Model depicting the dual role 
of Pif1. In BIR and crossover HR, Pifl promotes DNA synthesis by a functional 
interaction with Pol6-PCNA (this work), template strand separation’®, and by 
displacing the newly synthesized strand (this work). 


mild BIR deficiency, indicating that in BIR the D-loop does not need 
to be converted to a canonical replication fork (Fig. 2d). Second, mon- 
itoring the association of RPA with the template chromosome using 
ChIP revealed that extensive ssDNA is generated during template 
copying in a Pifl-dependent manner. We confirmed the presence of 
extensive ssDNA by applying qPCR after restriction digest of DNA 
synthesis intermediates (Fig. 2e and Extended Data Fig. 10a, b). These 
results indicate that the first and complementary strands are synthe- 
sized asynchronously. Finally, conditional depletion of Mcm4 or Psf2, 
components of the S-phase replication fork, leads to only a mild BIR 
deficiency (Fig. 2f and Extended Data Fig. 10c-e). Although we cannot 
exclude the possibility that in wild-type cells there is a switch from 
Pifl- to Mcm2-7-mediated synthesis, we have provided clear evidence 
that Pifl can support extensive DNA synthesis in the absence of the 
replicative helicase. 

Our results provide evidence for repair-specific Pifl-dependent 
DNA synthesis via a migrating D-loop (Fig. 4c) that can copy tens of 
kilobases. Aside from BIR and telomere recombination, such a mech- 
anism could function in gene conversion in fungi** and can cause 
various genome rearrangements”””*. 
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METHODS SUMMARY 


The strains listed in Methods are derivatives of (1) tGI354 to study ectopic recom- 
bination (hml::ADEI MATa-inc hmr::ADE1 ade1 leu2-3,112 lys5 trp1::hisG ura3- 
52 ade3::GAL::HO arg5,6::HPH::MATa) and (2) AM1003 to study BIR (MATa- 
LEU2-tel/MAT«-inc adel metl3 ura3 leu2-3,112/leu2 thr4 lysS hml::ADE1/ 
hml::ADE3 hmr::HYG ade3::GAL-HO FS2::NAT/FS2). The DSB was induced upon 
expression of the HO endonuclease by adding galactose to the media. Southern 
blotting and probes specific for either the broken or template chromosome were 
used to follow the kinetics of DSB repair and for the detailed analysis of individual 
repair products. Protein recruitment to DSBs was studied by ChIP followed by 
qPCR. Pifl and other helicases, homologous recombination proteins, and DNA 
replication factors were expressed either in E. coli or yeast cells and purified by a 
multi-step procedure to near homogeneity in each case. For the DNA synthesis 
reaction, a D-loop is made using Rad51, RPA and Rad54, followed by loading of 
PCNA with RFC onto the 3’ end of the invading strand, and incubation with 
combinations of Pifl or another helicase with Pold. Reaction products were ana- 
lysed by gel electrophoresis and phosphorimaging, or by electron microscopy with 
metal shadowing. 
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METHODS 

Media, strains and plasmids. The plasmids pVS31 (pifl-m2), pSH380 (PIF1) and 
pSH380 (pif1(K264A)) were from V. Zakian. The pifl-m2 mutation was intro- 
duced into the genome as described previously'*. For complementation tests, we 
amplified PIF1 from pSH380 (PIF1) and introduced it into pRS316 to create 
pRS316-PIF1; pifl(K264A) was created by subcloning the 0.7-kb AflII/Clal frag- 
ment from pSH380 (pif1(K264A)) into pRS316 (PIF1). 

For HO induction, cells (GAL10::HO) from an overnight culture in YEPD (1% 
yeast extract, 2% peptone, 2% dextrose) were transferred to YEP-raffinose (1% 
yeast extract, 2% peptone, 2% raffinose) and incubated overnight. Galactose was 
added to 2% when the cell density reached ~1 X 10’ cells ml’. 

To study ectopic recombination, allelic BIR, or ectopic BIR we used tGI354, 
AM1003, or JRL346 strains, respectively, or their derivatives. tG1354 hml::ADE1 
MATa-inc hmr::ADE1 adel leu2-3,112 lys5 trp1::hisG ura3-52 ade3::GAL::HO 
arg5,6::HPH::MATa (ref. 16) and its derivatives: pifl-m2 (yWH42); pifl::KANMX 
(yWH1217); pol32::KANMX (yWH80); pifl-m2 pol32::KANMX (yWH198); pif1:: 
KANMX pol32::URA3 (yWH1226); mph1::KANMX (tGI772)'°"’; pifl-m2 mph1:: 
KANMX (yWH1043); pol32::TRP1 mph1::KANMX (yWH221); srs2::LEU2 (tGI383)"*; 
pifl-m2 srs2::LEU2 (yWH1072); pol32::KANMX srs2::LEU2 (yWH1076); pol2-16 
(yWH1116); pol3-14 (yWH1103); rad30::KANMX (yWH1222) 

AM1003 MATa-LEU2-tel/MATa-inc adel met13 ura3 leu2-3,112/leu2 thr4 lys5 
hml::ADE1/hml::ADE3 hmr:: HYG ade3::GAL-HO FS2::NAT/FS2 (ref. 12) and its 
derivatives: trp 1::hisG (yWH422); leu2::KANMX (yWH271); pho87::URA3 (yWH279); 
pifl-m2 (yWH121); pifl-m2+ pRS316-PIF1 (yWH530); pifl-m2+ pRS316-pif1(K264A) 
(yWHS31); pol32::KANMX (yWH321); pifl-m2 pol32::KANMX (yWH304); pifl:: 
KANMX (yWH465); rad51::KANMX (yWH615); rad54::KANMX (yWH616); 
exol::TRP1 sgsl::KANMX (yWH612); pifl-m2 tlcl::LEU2 (yWH308); pifl-m2 
est2::KANMX (yWH328); hes1::KANMX (yAP427); pifl-m2 pho87::URA3 (yWH298); 
POL1-13Myc-TRP1 (yWH1176); pifl-m2 POL1-13Myc-TRP1 (yWH1177); POL2- 
13Myc-TRP1 (yWH499); pifl-m2 POL2-13Myc-TRP1 (yWH501); POL3-13Myc- 
TRP1 (yWH634); pifl-m2 POL3-13Myc-TRP1 (yWH1110); RAD30-13Myc-TRP1 
(yWH1179); pifl-m2 RAD30-13Myc-TRP1 (yWH1178); pifl-m2 MCM7-3HA- 
TRP1 (yWH1056); pifl-m2 pifl-m1-4Myc-TRP1 pol32::KANMX (yWH971); pifl- 
m2 pifl-m1-4Myc-TRP1 rad52::KANMX (yWH1003); ura3::URA3 thr4::THR4 
(yMW331); pifl 1: KANMX (yMW335); MA Ta-LEU2:;URA3-tel/MATa-inc (yMW393); 
CUPI1::mem4-td::KANMX (yMW412); CUPI::psf2-td::KANMX (yMW467). 

JRL346 mata::HOcsDEL::hisG ura3DEL851 trpI1DEL63 sup53DEL::leu2DEL:: 
NATMX hmlDEL::hisG hmrDEL::ADE3 ade3::GAL10::HO can1,1-1446::HOcs:: 
HPH::DEL AVT2 ykl215c::LEU2::hisG::can 1DEL1-289 (ref. 14); and its derivative, 
pifl: KANMX (yGI272). 

Pulsed-field gel electrophoresis and analysis of DSB repair products. To ana- 
lyse DSB repair kinetics and products in AM1003 derivative strains, chromosomal 
DNA plugs were prepared and separated as described", followed by Southern blot- 
ting and hybridization with probes specific for either ADE1, ADE3, MCH4 or MAT 
(ref. 4). Allelic BIR product formation was estimated as the per cent of the initial 
uncut chromosome III. Percentage of the chromosome III template remaining during 
repair was measured as the normalized intensity of the band corresponding to chro- 
mosome III in each time point after break induction multiplied by 100 and divided 
by the intensity the band corresponding to chromosome III at time point ‘0’. 
Chromatin immunoprecipitation. ChIP analyses of DNA polymerases, Rad51 
and Pifl were performed and quantified as described*’. The anti-Myc (9E10) 
antibody was from Sigma (M4439). Anti-Rfa2 antibody was from W.-D. Heyer” 
and anti-Rad51 antibody was from our laboratory stock*'**. Experiments were 
done at least three separate times and the t-test was used to establish the statistical 
significance of the results. The primers used for qPCR were: MATX-F2 —1 kb (5’-GG 
TAGGCGAGGACATTATCTATCA-3’) and MATX-R3 —1kb (5'-GAAGAAT 
ACCAGTTTATCTCGCATTCAAATC-3’); 5 kb RHO-F1 (5’-ATTCACTAACA 
ATGGCTCTAGGAGTGGCG-3’), 5 kb RHO-R1 (5'-CTTGCGGATATCGTGC 
TACAAAATCAGTC-3’), 10 kb RHO-F1 (5'-TCTCTCCCTTTCAGCAGCTGC 
TCAGAG-3’), 10 kb RHO-R1 (5’-GAAGAAACACACATCCTCACACGCATA 
TTC-3’), 30 kb RHO-F1 (5'-CTCTCATGGTTCGGACTTACTTAAAACACCC-3’), 
30 kb RHO-R1 (5'-AATTCGTTGCGCTTGTGAGGACATCGG-3’), 67 kb RHO-F1 
(5'-GCTGCAGTTGCTAATAATCTG-3’) and 67 kb RHO-R1 (5'-CGGGAGGA 
GTGGAAGCC-3’). These primer pairs are for sites —1, +5, +10, +30 and +67 kb 
from the break. 

Analysis of non-homologous DNA tail removal. To determine removal of the 
non-homologous Ya tail as a measure of successful DNA strand invasion, we 
performed real-time PCR using genomic DNA and primers specific for the Ya 
tail (642 bp): MATX-F1 (5'-GTTGTTACACTCTCTGGTAACTTAGGTAAA-3’) 
and MATYa-R2 (5’-CAATCTCAGTACCTAGAATGTTAAACAGAG-3’) desig- 
nated as P1 and P2 in Extended Data Fig. 4b. 

Initial DNA synthesis analysis. To measure initial DNA synthesis in BIR, 2 ng of 
genomic DNA from different times after DSB induction was amplified by PCR 
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with the primers: P1-BIR URA3 (5'-ACCCGGGAATCTCGGTCGTAATGA-3’) 
and P2-Z] distal (P1 and P2 in Fig. 1d; 5’-ATCCGTCACCACGTACTTCAGC-3’). 
As control, the CHAI gene on chromosome III was amplified. 
Determination of DSB repair by BIR and other mechanisms. To quantify allelic 
BIR, we used a disomic strain with an extra, truncated copy of chromosome III 
wherein the arm 100kb distal from MATa is replaced with LEU2 followed by 
telomeric repeats’. In this assay, the chromosomes participating in repair are 
marked by either LEU2, ADE1 or ADE3 which allows determination of the repair 
pathway by growth on selective media (Fig. 1a)'*. To avoid any contribution of Ty 
transposon repeats to repair, the nearest Ty1 repeats to the DSB were replaced with 
a NAT cassette. BIR leads to the loss of the LEU2 marker. When repair fails, ADE1, 
NAT and LEU2 markers are lost, and the colonies appear red due to an Adel 
deficiency. When cells repair the DSB by gene conversion using the short homo- 
logy on the right side of the break all the markers are retained’’. In rare cases in 
wild-type cells, part of the homologous template chromosome is lost due to a half- 
crossover that eliminates the ADE3 marker, and colonies are Ade and appear white. 
The BIR assay was performed by plating on YEP-galactose medium and replica 
plating on Leu, Ade dropout and NAT selective media. For each strain, at least 
1,000 colonies were scored. The frequencies of BIR, half crossovers, gene conver- 
sion and chromosome loss were estimated based on the percentage of colonies 
carrying markers specific for these repair outcomes, as described above (Fig. 1a) 
and reported previously’*. Pedigree analyses and individual product size analyses 
by CHEF confirmed repair by BIR in wild-type cells'*. Because the repair by BIR 
occurs in G2/M cells two copies of each chromosome are present. Owing to 
random segregation of chromosomes, half of the half crossovers (the major prod- 
uct in pif1A cells) segregate with an intact copy of the full-length chromosome III, 
and are genetically and structurally indistinguishable from BIR as confirmed by 
pedigree analysis'*. Therefore, the number of Ade white colonies scored on 
selective media as half crossovers represent only half of these events. To correct 
for this, the number of half crossovers (Ade , white colonies) was multiplied by 
two, and consequently the number of BIR events was adjusted by subtracting the 
number of Ade’ , white colonies. Second, upon analysis of the repair products by 
CHEF from Ade* colonies that have lost the distal NAT marker (Extended Data 
Fig. 2d), we found that about half of them carried a genomic rearrangement, 
whereas the other half of the products corresponded by size to BIR, where strand 
invasion occurred proximal to the NAT marker. Therefore, half of the NAT? col- 
onies were scored as gross chromosomal rearrangements and the other half as BIR 
events. The number of BIR events in piflA mutants were still probably over- 
estimated because about one-third of the Ade* NAT® Leu” colonies did not result 
from allelic BIR, but from a half crossover event associated with a stabilization of the 
part of template chromosome carrying the ADE3 marker (Extended Data Fig. 2b). 
The percentage of cells that repair a DSB by ectopic BIR“ (BIR between two short 
homologous sequences located on heterologous chromosomes) was calculated as 
the number of canavanine-sensitive (Can®) colonies (Extended Data Fig. 3a) 
formed on YEP-Gal plates divided by the number of all colonies formed on 
YEPD plates, multiplied by 100. In this BIR assay, recombination between two 
truncated copies of the CANI gene located on chromosome V and XI leads to the 
formation of an intact CAN] gene resulting in canavanine sensitivity. In piflA cells 
the number of BIR events (~1%) is probably overestimated because over half of the 
cells that form the full-length CANI gene during DSB repair do not complete repair 
(Extended Data Fig. 3d). Additionally, some of the products that appear to be BIR 
can correspond to half crossovers that in this assay (haploid cells) cannot be scored 
(half-crossovers that segregate with an intact template chromatid cannot be dis- 
tinguished from BIR and the half-crossovers that lose a large part of the template 
chromosome are inviable). 
Measurements of crossover frequency, DSB repair and viability in ectopic 
recombination assay. We determined the frequency of crossovers, viability and 
efficiency of the DSB repair as previously described’*”’. 
Measurement of ssDNA formation by qPCR. The ssDNA amount was mea- 
sured as previously described** 10 and 41 kb away from the DSB end and as a 
control on the chromosome V not participating in recombination. The primers 
used in this assay were: OoMW 1082 10832bp-F (5’-CACTAAGTTCTTGGACAG 
GT-3'); oMW 1083 10832bp-R (5’-AATACTGGTCATGAAGCCAC-3’); oMW 
1116 ChrV 41701bp-F (5'-GGCAGCCACCCTTATGGTGAGG-3’); oMW 1117 ChrV 
41701bp-R (5'-GGCCGCAAGGGCCAAGACAAGG-3'); oMW 1120 40890-F 
(5'-CCTTTCACCGTCTATGGGCC-3’); oMW 1121 40890-R (5'-CAATTCCTC 
ATTTCCATCGG-3’). 
Measurement of conversion tracts. We used an assay in which an HO break is 
generated within LEU2 and is repaired by allelic recombination with leu2-R. In this 
assay, Leu* colonies stem from short conversion tracts and Leu’ colonies from 
longer conversion tracts**. Cells were plated on YEP-Gal plates and then replica 
plated on medium lacking leucine. 
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Preparation of proteins. Pifl and pifl(K264A): PIFI cDNA encoding the nuclear 
form of Pifl (amino acids 40-859) was inserted in the pRSF-Duet-1 vector 
(Novagen) to add an amino-terminal 6X His tag. The K264A mutation was intro- 
duced by QuikChange mutagenesis (Agilent Technologies). Pifl and pifl(K264A) 
were expressed in E. coli Rosetta cells (Novagen), with induction by 0.1mM 
isopropyl-1-thio-B-p-galactopyranoside (IPTG) for 18h at 16°C. The cell paste 
from a 10-1 culture was resuspended and sonicated in 100 ml buffer A (20 mM 
K,HPO,, pH7.5, 0.5 mM EDTA, 10% glycerol, 0.01% Igepal, 2 mM DTT) contain- 
ing 100 mM KC] and protease inhibitors (aprotinin, chymostatin, leupeptin and 
pepstatin A at 5 ug ml | each, 1 mM phenylmethylsulphonyl fluoride). The lysate 
was clarified by ultracentrifugation and loaded onto a Q sepharose column (8 ml). 
The flow-through fraction was applied onto a SP-sepharose column (8 ml) and 
fractionated with a 90-ml gradient of 150-660 mM KCI in buffer A. Fractions 
containing Pifl were incubated with 2 ml Ni-NTA-agarose beads (Qiagen) and 
10 mM imidazole for 2 h. After washing three times with 10 ml buffer A containing 
1M KCl, 1 mM ATP, 8 mM MgCl, and 15 mM imidazole, the bound proteins were 
eluted with 20 ml buffer A containing 150 mM KCl and 200 mM imidazole. The 
protein pool was fractionated in Mono S (1 ml) with a 40-ml gradient of 150- 
600 mM KCl in buffer A. Pifl was concentrated in an Ultracel-30K concentrator 
(Amicon) and stored at —80 °C. 

Pold and Pold*: Pold (Flag-Pol3, GST-Pol31 and Pol32) and Pold* (Flag—Pol3, 
GST-Pol31) were expressed in S. cerevisiae strain YRP654”*. Cells from a 10-1 culture 
were disrupted using a coffee grinder and resuspended in 100 ml buffer B (50 mM 
Tris HCl, pH_7.5, 10% sucrose, 1 mM EDTA, 175 mM (NH4)3SO,, 200 mM NaCl, 
1mM DTT, 0.01% Igepal, protease inhibitors as above). After ultracentrifugation, 
the lysate was treated with 0.277 g ml ' of (NH,)SO,. The precipitate was pelleted 
by centrifugation and dissolved in 100ml buffer C (25mM Tris HCl, pH7.5, 
0.5mM EDTA, 10% glycerol, 0.01% Igepal, 10 mM 2-mercaptoethanol and 
200mM KC)) and dialysed against the same buffer. The protein was purified by 
affinity chromatography in glutathione Sepharose (GE Healthcare; 5-ml) and anti- 
Flag M2 resin (Sigma; 2 ml). Protein was concentrated and stored at —80 °C. 

RFC: RFC (GST-RFC1, RFC2, RFC3, RFC4, RFC5) was expressed in S. cerevisiae 
strain YRP654 using pBJ1476 (2u, GAL-PGK-GST-RFC1/RFC4/RFC5, LEU-2d) 
and pBJ1469 (211, GAL-PGK-RFC2/RFC3, TRP1) and purified from clarified cell 
lysate by (NHy)2SOy, precipitation and affinity purification using glutathione 
sepharose as above. 

Pol32: MBP-Pol32 (with MBP cleavable with TEV protease) was expressed in 
E. coli Rosetta cells harbouring pMAL-POL32 with induction by 1 mM IPTG for 
4h at 37 °C. The cell paste from a 500-ml culture was resuspended in 50 ml buffer 
D (50 mM Tris-HCl, pH 7.5, 10% sucrose, 1 mM EDTA, 0.01% Igepal, 1 mM 2- 
mercaptoethanol, 150-mM KCl and protease inhibitors as above), sonicated, and 
clarified by ultracentrifugation. Nucleic acids were removed by adding 700 ll 10% 
polyethyleneimine (J. T. Baker) and centrifugation. MBP-Pol32 was purified by 
affinity chromatography with 6 ml amylose resin (BioLabs) and fractionation in 
1 ml Source S with a 30-ml gradient of 100-500 mM KCl in buffer E (25 mM Tris- 
HCl, pH7.5, 10% glycerol, 0.5 mM EDTA, 0.01% Igepal, 1 mM 2-mercaptoeth- 
anol), and in a 1 ml macrohydroxyapatite using a 30-ml gradient of 0-300 mM 
KH>PO, in buffer E. MBP-Pol32 was concentrated and stored at —80 °C. 

Rrm3: DNA that harbours RRM3-Flag was cloned into the pMAL-TEV vector 
(BioLabs) to add an N-terminal MBP tag. Expression was in E. coli Rosetta cells 
with induction by 0.1mM IPTG for 24h at 12°C. The cell paste from a 3.3-1 
culture was resuspended and sonicated in 40 ml buffer F (25mM Tris-HCl, 
pH7.5, 0.5mM EDTA, 10% glycerol, 0.01% NP-40, 1mM DTT, 500mM KCI, 
0.2mM ATP, 5mM MgCl, and protease inhibitors as above). The lysate was 
clarified by ultracentrifugation and treated with 0.277 g ml ' of (NH,)2SOx. The 
protein precipitate was pelleted by centrifugation and dissolved in 30 ml buffer F. 
MBP-Rrm3 was purified by two-step affinity chromatography with amylose resin 
(2 ml) and anti-Flag M2 resin (0.7 ml). Rrm3 was concentrated and stored at 
—80°C. 

Rad51, Rad54, RPA, PCNA and Mph1 were purified as described elsewhere’””*””. 
DinG was from D. Camerini-Otero. 

D-loop extension. The *”P-labelled 90-mer oligonucleotide (2.4 1M nucleotides), 
homologous to positions 1932-2021 of pBluescript DNA*’, was incubated with 
Rad51 (800 nM) in buffer G (35 mM Tris-HCl, pH 7.5, 1 mM DTT, 7 mM MgCl.) 
containing 100ng pl’ BSA, 30mM KCl, 2mM ATP, an ATP-regenerating sys- 
tem (20 mM creatine phosphate, 30 ng ul? creatine kinase), and 100 uM each of 
the four dNTPs for 10 min at 37 °C. This was followed by a 5-min incubation with 
RPA (400 nM) at 30 °C, a 2-min incubation with Rad54 (200 nM) at 23 °C, anda 
2-min incubation with pBluescript DNA (37 iM base pairs) at 30°C. For D-loop 
extension, the reaction was mixed with PCNA (200 nM) and RFC (200 nM) and 
incubated on ice for 2 min. Then, Pold (100 nM) and Pifl (13-40 nM) were added 
to the reaction, followed by an incubation at 15 °C. Reaction mixtures were depro- 
teinized with 0.5% SDS and 0.5 mg ml * proteinase K for 10 min at 37 °C before 


being resolved in a native gel (0.8% agarose) in TAE buffer (40 mM Tris-acetate, 
pH7.5, 0.5mM EDTA), or in a denaturing gel (4% polyacrylamide, 7 M urea) in 
TBE buffer (90 mM Tris-HCl, pH 8.3, 90 mM boric acid, 2mM EDTA), or in a 
0.9% agarose gel in 50 mM NaOH, 1 mM EDTA (Extended Data Fig. 5d). Dried 
gels were analysed in a phosphorimager (BioRad). 

For quantification of DNA synthesis in Fig. 3b, D-loop extension was carried 
out as above, except that the invading strand was unlabelled and the reaction was 
supplemented with [a-*’P]dCTP (80nCipl'). The reaction products were 
resolved in a native gel and analysed. 

DNA extension from deproteinated D-loop. The D-loop reaction (250 ul) was 
performed as above with the **P-labelled 90-mer oligonucleotide. The reaction 
was deproteinized with SDS and proteinase K as above. After an extraction with 
phenol-chloroform-isoamy] alcohol (25:24:1), the buffer was exchanged with buf- 
fer H (35mM Tris-HCl pH 7.5, 1mM DTT, 9.3mM MgCh, and 30mM KCl) 
using a Zebra Spin-desalting Column (Thermo Scientific). DNA synthesis reaction 
was carried out with the deproteinized D-loop (equivalent to 1.2 .M nucleotides of 
the *”P-labelled 90-mer oligonucleotide), with 2mM ATP, the ATP-regenerating 
system, 100 ng ul? BSA and 100 uM each dNTPs, 200 nM RPA, 100nM PCNA, 
100nM REC, 50nM Pols, and 8, 16, 24nM Pifl with an 8-min incubation. 
Two-dimensional gel electrophoresis. Deproteinized reaction mixtures were run 
in a 0.8% agarose gel in TAE buffer. Then, lanes containing the radiolabelled 
species were excised and placed on top of a 0.9% agarose gel. Electrophoresis in 
the second dimension was done in 50 mM NaOH, 1 mM EDTA. A gel strip from 
the first dimension is shown above the two-dimensional gel. 

Pull-down assay. Pol32 and Polé*: MBP-Pol32 (5 1g) or TEV-protease treated 
MBP-Pol32 was incubated with Pold* (5 jg) in 20 pl buffer I (25 mM Tris-HCl, 
pH7.5, 10% glycerol, 1 mM DTT, 0.01% Igepal, 150 mM KCl) for 30 min on ice, 
then mixed with 10 il glutathione sepharose for 1 h at 4°C. The resin was washed 
four times with 100 ul buffer I, then eluted with 20 1l 2% SDS. The supernatant 
containing unbound proteins, final wash and SDS eluate, 10 ul each, were analysed 
by SDS-PAGE and Coomassie blue staining. 

Pifl and PCNA: 6XHis-Pifl (3 ug) was incubated with PCNA (3 jig) in 30 pl 
buffer J (25mM Tris-HCl, pH7.5, 0.01% Igepal, 1mM 2-mercaptoethanol, 
100 mM KCl) containing 20 mM imidazole for 30 min at 4°C, then mixed with 
6 ul Ni-NTA agarose for 1h at 4°C. The resin was washed three times with 50 pl 
buffer J, then eluted with 20 pil 2% SDS and analysed as above. 

Pifl and Pold: Pold (9 1g) with GST-Pol31 was incubated with Pifl (3 1g) or 
PCNA (3 pig) in 30 pl buffer J for 30 min at 4 °C then mixed with 12 ul glutathione 
sepharose for 1 hat 4 °C. The resin was washed three times with 40 pil buffer J, then 
eluted with 25 pl 2% SDS and analysed as above. 

Restriction digests of extended D-loops. Reactions (20 11) were extracted with 
phenol-chloroform-isoamyl alcohol and DNA was precipitated with ethanol, 
which was dissolved in 10 pl buffer K (20 mM Tris-acetate, pH 7.9, 10 mM Mg- 
acetate, 50 mM K-acetate, 1 mM DTT, 100 ng pl y BSA) and incubated with AhdI 
(0.25U ul?) or XmnI (1U ult) at 37 °C for 10 min. SDS was added to 0.5% and 
the DNA species were separated on a denaturing gel. 

Electron microscopy. DNA from 40 ,1l of reactions at a 16-min time point was 
dissolved in 20 pl buffer L (25mM HEPES, pH7.5, 10mM MgCl and 50mM 
KCl). T4 gp32 protein (NEB) was added to 1.5 1M, and after a 10-min incubation 
at 25°C, crosslinking of protein to DNA was carried out using glutaraldehyde 
(0.6%) at 25 °C for 5 min. DNA was purified in 2-ml 6% agarose beads (Agarose 
Bead Technologies) equilibrated with TE buffer (10mM Tris-HCl, pH7.5 and 
1mM EDTA). The samples were adsorbed onto glow-charged thin carbon support 
in TE buffer containing 2.5 mM spermidine, dehydrated through a series of water/ 
ethanol washes, and air dried’’. The electron microscopy grids were shadowed by 
rotary tungsten coating at 1 X 10’ torr and examined in an FEI Tecnai 12 TEM at 
40 kV. Images were captured using an Ultrascan400 scan CCD camera (Gatan 
Inc.). Adobe Photoshop was used to invert images. 
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Extended Data Figure 1 | Analysis of BIR and conversion tracts in piflA 
mutants and crossover frequency in polymerases mutants. a, Southern blot 
analysis of BIR product formation and template chromosome maintenance. 
Chromosomes were separated by pulsed-field gel electrophoresis and a DNA 
probe specific for either ADE1 or ADE3 was used. Quantification is shown in 
Fig. Ic. b, c, Analysis of DSB repair outcomes in the BIR assay of the indicated 
mutants. d, Schematic of allelic recombination between the /eu2 alleles (left). 
Longer conversion tracts associated with conversion of ‘R’ leads to formation of 
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Leu recombinants, whereas shorter conversion tracts lead to formation of 
Leu* recombinants. Quantification of gene conversion events with shorter 
conversion tracts (Leu*) in wild-type (WT) and pif1A cells is shown. The 
difference between wild-type and pif1A cells is statistically significant, 
P<0.0001. e, Southern blot analysis of gene conversion with and without 
crossing over in the indicated strains using the ectopic recombination assay 
shown in Fig. 2a. Quantification of crossover product in the indicated mutants 
compared to wild type that is set to 1. Plotted are the mean values ~ s.d. n = 3. 
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Extended Data Figure 2 | Analysis of recombination products in Pifl- b, Analysis of recombination products from Ade’ NAT® Leu colonies. 
deficient cells. a, Illustration of the half-crossover pathway where the part of | Examples are shown where rearrangements of the template chromosome are 
the template chromosome distal to the initial invasion site is fused to the broken _ indicated by an asterisk. c, Analysis of half-crossover recombination products 
chromosome, with the remainder of the template chromosome either from Ade NAT® Leu’ colonies. d, Analysis of rare NAT® Ade* colonies. 
becoming stabilized (examples shown in Fig. 2b) or lost (as shown in Fig. 2c). 
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ectopic BIR assay. b, Southern blot analysis of ectopic BIR kinetics in wild-type in pif1A (d) and wild-type cells (e). Examples where synthesis is initiated but 
and pif1A cells. A probe specific for the MCH2 gene located at the end of not finished are indicated by an asterisk. In these cases, a functional CAN] gene 


chromosome XI was used in the analysis. c, Quantification of ectopic BIRrepair _ is formed but synthesis is abandoned resulting in shorter products. Red circles 
(Can colonies) in wild-type and pif1A cells. Plotted are the mean values + s.d. indicate major rearrangement of chromosome V or template chromosome XI. 


©2013 Macmillan Publishers Limited. All rights reserved 


LETTER 


IP 


50 


20 


10 


fold change 


“ P1 8 
— Ya Ww 
-Akb DSB 
> << 
P2 
a 

—= Ya 

= { 

+1kb — 


0 


ee 
Rad51 ChIP ——_— 


Ol wr 4 
ee = an 
— Way 
— piftm2 
& 100 — rad51A 
4 0 4 5 80 — rad54a 
time after DSB (h) B 40 
B 
s 20 
) 
-1 kb DSB 0 1 2 3 4 
— time after DSB (h) 
I 
> 
ot 
+10 kb 
Pif1-13xMyc = Pif1-13xMyc 
(1 kb) (+10 kb) 
10 [_] wr 
= 8  pos2. 
5 20524 
4 
2 
0 
fe) 2 4 ) 2 4 
time after DSB (h) time after DSB (h) 
, 10 
Pif1-13xMyc ChIP 
(-1 kb) % 8 _ (A kb) 
without crosslinking -_— No protein tagged with Myc 
5 
xo} 
3g 4 
2 
fe) 
fe) 2 4 
time after DSB (h) time after DSB (h) 


©2013 Macmillan Publishers Limited. All rights reserved 


Extended Data Figure 4 | Analysis of Pif1’s role in the initial steps of BIR 
and of Pifl recruitment at the DSB and template. a, b, Analysis of initial 
strand invasion in wild-type and pif1A cells. a, Enrichment of Rad51 at the DSB 
site and template by ChIP analysis. b, Kinetics of removal of the non- 
homologous Ya tail by qPCR analysis in wild-type and pifl-m2 strains 
compared to the control strains rad51A and rad54A that are defective in strand 


LETTER 


invasion. c, Enrichment of Pifl at the DSB and template by ChIP analysis in 
wild-type, pol32 and rad52 cells. The regions amplified by qPCR are indicated. 
d, Control ChIP experiments in the PIFI-13X Myc strain where crosslinking 
was omitted and in a strain where the Myc tag was absent. a-d, plotted are the 
mean values + s.d. n = 3. 
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Extended Data Figure 5 | Quality analyses of proteins, protein 
requirements for DNA extension, effect on Pifl on D-loop stability, and 
time-course analysis of DNA extension. a, Purified Pifl and pifl(K264A) 
were analysed by SDS-PAGE and staining with Coomassie blue. b, The plasmid 
DNA in all the lanes was pBluescript SK replicative form I (RFI). DNA 
synthesis reactions were performed with 13, 27 and 40 nM Pifl and the reaction 
mixtures (lanes 1-8) from the 8-min time point were incubated at 95 °C for 
2 min to disrupt the D-loop, followed by native gel electrophoresis and staining 
with ethidium bromide. Various other DNA forms (lane 9, plasmid DNA alone; 
lane 10, plasmid DNA linearized with Xho]; lane 11, plasmid DNA relaxed by 
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calf thymus topoisomerase J; lane 12, plasmid DNA relaxed by E. coli 
topoisomerase I; lane 13, plasmid DNA digested with DNase I) are shown. 

c, DNA synthesis reactions by Pold in conjunction with Pifl (40 nM Pifl and 
8-min incubation) with the omission of one or more of the protein factors or 
dNTPs, as indicated. The reaction products were analysed in a native gel (top) 
or denaturing gel (bottom). Note that a substantial portion of the D-loop was 
dissociated by Pifl in the absence of PCNA, RFC, Pold, or dNTPs (lanes 5, 7, 9 
and 19). d, Time course of DNA synthesis by Pold in conjunction with Pifl 
(40 nM Pif1). The reaction products were analysed in a native gel (top) or a 
denaturing gel (bottom). 
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Extended Data Figure 6 | Effect of Rad51 and/or Rad54 removal on DNA 
extension. a, DNA synthesis from a deproteinized D-loop by Polé in 
conjunction with Pifl (8, 16 and 24nM) was examined. Pifl was at 24nM in 
lane 6. The reaction products were analysed in a native gel (left) or denaturing 
gel (right). b, After the D-loop reaction had proceeded for 2 min, Rad54, which 
is highly heat labile’, was inactivated by incubation at 42 °C for 20 min. DNA 
extension reaction and analysis were then performed by adding RPA, RFC, 
PCNA, Pold and Pifl (40 nM Pifl and 8-min incubation). The reaction 
products were analysed in a native gel (left) or denaturing gel (right). The 
inactivation of Rad54 was verified by examining the ATPase activity of Rad54, 
which decreased by ~95% compared to the unheated control (data not shown). 
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Extended Data Figure 7 | Specificity of Pol6-Pif1-mediated DNA 
extension. a, DNA synthesis reactions were conducted with Polé and Pifl, 
Mph1, Rrm3, or DinG (13, 27, 40, 120nM). The reaction products from the 
8-min time point were resolved in a native (top) or denaturing gel (bottom) 
(lane 1, no protein control; lane 2, D-loop formed by Rad51-Rad54; lanes 3-20, 
D-loop extended with Pold and the indicated helicase). b, E. coli DNA 
polymerase I Klenow fragment (100 nM, from NEB) was tested for DNA 
extension with Pifl (13, 27, 40 nM) with or without PCNA (200 nM) and RFC 
(200 nM). The reaction products from the 8-min time point were analysed in a 
native gel. 
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Extended Data Figure 8 | Requirement for the Polé subunit Pol32 in DNA 
extension, and interaction of Pifl with PCNA. a, Purified Polé (Flag—Pol3, 
GST-Pol31, Pol32), Polé* (Flag—Pol3, GST-Pol31) and MBP-Pol32 were 
analysed by SDS-PAGE and staining with Coomassie blue. b, Pull-down assay 
to examine Pol32-Pold* interaction. c, DNA synthesis was performed with 
Pol6d or Pold* (20 or 40 nM) with Pifl (40 nM). In lanes 10-13, Polé* and Pol32 


(125 nM) were pre-incubated on ice for 10 min before use. The reaction 
products from the 8-min time point were resolved in a native gel (left) or 
denaturing gel (right). d, Pull-down reactions of 6 His—Pifl and PCNA (left), 
Polé (Flag—Pol3, GST-Pol31, Pol32) and PCNA, Polé and Pifl (right). E, SDS 
eluate; S, supernatant; W, wash. 
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lanes 3-12 contained D-loop made by Rad51-Rad54. An overexposed image 
and the scan of lanes 5 and 6 to highlight the effect of topoisomerase when Pifl 
was absent are shown (bottom). b, Two-dimensional gel analysis of the 
extension products. The reaction products, prepared as in lanes 5, 6, 11 and 12 
of panel a, were subject to two-dimensional gel analysis. 


Extended Data Figure 9 | Effect of topoisomerase I in DNA extension. 

a, DNA synthesis products, initiated by Polé for 4 min, and then continued with 
Pifl (13, 27, 40 nM) with and without calf thymus topoisomerase I (0.4 U ul’) 
for 8 min. The reaction mixtures were resolved in a native gel (top) or 
denaturing gel (middle). Lanes 1 and 2 contained DNA substrates only and 
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Extended Data Figure 10 | Measurement of ssDNA intermediates 

formed during BIR and analysis of BIR efficiency in the absence of Psf2 and 
Mcm4. a, Schematic of the assay. b, Measurement of the relative increase of 
ssDNA at the indicated time after DSB induction compared to the amount of 
ssDNA in logarithmically growing cells (t = 0). Measurement of ssDNA 
intermediate 10 and 40 kb from the site of strand invasion at the template 
chromosome and at a control locus on chromosome V which does not 


participate in recombination. Plotted are the mean values + s.d. n = 3. c, An 
analysis of the growth of cells harbouring temperature-sensitive degron alleles 
of td-mcm4 and td-psf2. Both strains are inviable at 37 °C even without 
overexpression of the ubiquitin ligase Ubr1. d, Western blot analysis of 
td-Mcm4 and td-Psf2 protein degradation. e, Southern blot analysis of the BIR 
assay in cells with conditional depletion of td-Mcmé4 or td-Psf2. Quantification 
of the Southern blots is shown in Fig. 2f. 
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Pack a punch 


Grant reviewers are increasingly focusing on the scientific 
and social impact of proposed research projects. 


BY AMBER DANCE 


hen a few dozen scientists in a 
US National Institutes of Health 
(NIH) study section sit down to 


start reviewing grant applications, they have 
one main question on their minds, says Fatah 
Kashanchi, who has participated in more 
than 100 such sessions. Does the proposal lay 
out a significant question? “If it’s not impor- 
tant, then you shouldn't be spending your 
time — and other people’s money — on this,’ 
says Kashanchi, a former NIH virologist now 
at George Mason University in Manassas, 
Virginia. 

Public and private granting bodies across 
the world focus on the impact of research. But 


the terms used to describe impact, and the 
types of impact that the bodies are interested 
in, vary widely. Some funders, such as the NIH, 
are mainly concerned with the project's impor- 
tance in a specific field. Others expect grant 
recipients to make a splash beyond laborato- 
ries, publications and conferences — they are 
looking for implications for the economy, on 
education or elsewhere in society. 

Interest in broader impact is rising. In 2009, 
the seven government-funded granting agen- 
cies that make up Research Councils UK 
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(RCUK) began requiring applicants to delineate 
their impact plans. The Swiss National Science 
Foundation (SNSF) added a section on broad 
impact to its application forms in 2011. The US 
National Science Foundation (NSF) has long 
required applicants to combine scientific value 
with impact outside the lab, and in 1997 made 
broader impact an explicit part of the grant 
review. The foundation started requiring a sepa- 
rate section on impacts in applications this year. 
Why the increased emphasis? It is attrib- 
utable mainly to governments with ever- 
shallower pockets wanting to know that the 
research they pay for will pay off in the real 
world. “A scientist's ability to sell his research 
is becoming more and more important,’ says 
Meg Bouvier, a medical writer in Amherst, 
Massachusetts, who has helped clients to 
win millions of dollars in NIH grants. 
Some scientists worry that heightened 
attention to impact will draw funds away 
from basic, ‘blue skies’ science in favour 
of applied projects. When the RCUK 
first introduced impact statements, “a 
small but vociferous group of scientists 
were not keen on what they termed the 
impact agenda’, says Alexandra Saxon, head 
of the RCUK’s strategy unit in Swindon. But 
reviewers interest in impact does not have 
to pose a risk to basic studies, says Bill Petri, 
a biomedical scientist at the University of 
Virginia in Charlottesville who has scored 
millions of dollars from the NIH. “You can 
make a compelling case for the most funda- 
mental of science being impactful,” he says. 


VARIETIES OF IMPACT 
Scientific significance is always a high priority. 
NIH grant applicants must explain the scien- 
tific value of their projects at several points in 
the application, including the abstract and the 
Significance section. The first sentences of 
the Specific Aims section should clearly lay 
out the epidemiology of the health issue at 
hand, says Kashanchi. The Wellcome Trust, a 
biomedical funding charity in London, does 
not specifically ask about impact, but does 
expect the Vision section of the proposal to 
mention the importance of the topic. Science 
significance takes a back seat only in certain 
early-career grants, such as the NIH’s Career 
Development, or ‘K’ awards, in which the 
long-term potential of the applicant may out- 
weigh the importance of the project. 

At some agencies, broader impact also 
comes into play. For RCUK bodies, applicants 
must write a short, plain-English Impact > 
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CHRISTOS GEORGHIOU/SHUTTERSTOCK 


> Summary, explaining who might ben- 
efit from their research, and how. They must 
also submit a Pathways to Impact statement, 
describing how they will engage with those 
beneficiaries. Applicants might plan to part- 
ner with industry, for example, or develop an 
educational programme. 

The type of broader-impact project can 
differ between disciplines. A mathematician 
could explain his or her research to scientists 
in other fields, who might find it useful for 
modelling their own systems, suggests John 
Hand, head of impact at the UK Engineering 
and Physical Sciences Research Council. Engi- 
neers, by contrast, might offer applied projects 
with more direct practical impact, such as ways 
to scale up production processes. 

The RCUK bodies not only want to hear 
about impact ideas, they also want to pay for 
them. Saxon says that applicants might reason- 
ably ask for roughly 5% of the grant budget to 
go towards impact activities — paying for a 
research associate to work in an industry lab 
for several months, for example. 


BROAD BASE 
For some grants, broad impact is optional. 
SNSF applicants can choose whether to des- 
ignate their project ‘use-inspired. If they do 
select this label, they must explain the practical 
implications of the work at some point in the 
application. A use-inspired proposal does not 
necessarily give applicants an advantage, but it 
does help the SNSF to know whether it should 
recruit non-scientist reviewers — for example, 
clinicians for biomedical proposals. 

For NSF grants, which do require evidence 


of impact, it used to be sufficient to mention 
publications or presentations. But today the 
agency wants more direct societal benefits, says 
Ed Hackett, a social scientist at Arizona State 
University in Tempe who has worked at the 
NSE With the competition so stiff, a good case 
for broader impact could make the difference 
between success and failure, says Hackett. Pro- 
jects might include visiting schools, developing 
educational materi- 
als, communicating 
science to the public, 
training young scien- 
tists or collaborating 
with local industries. 
To find out what 
kind of impact infor- 
mation an agency is 
looking for, appli- 
cants should check 
the agency’s mission 


statement, suggests “You can make 

Bouvier. Even better, “@COMp elling 

she says, ask a pro- Case for the most 

gramme officer about fundamental of 

priorities. science being 
There are many impactful.” 

ways to address broad Bill Petri 


impact, if that is what 

the agency is asking for. Hackett recommends 
looking beyond the lab and university. For 
example, a researcher might talk to parents to 
find out what gets their children excited about 
science, and tailor an educational programme 
to match. Or an engineer might chat to local 
industry figures about their environmental 
concerns, and work out how to use academic 


IMPACT FACTOR 


Stand out from the crowd 


Individual granting agencies deal with 
research significance and broader impact in 
different ways, so be sure to check specific 
instructions when applying. Here are some 
general tips. 

@ Look up the mission statement of the 
granting agency — your proposal should fit 
its aims. 

@ Use online databases, such as the US 
National Institutes of Health’s REPORTER 
tool (http://projectreporter.nih.gov), to find 
out what kinds of research an agency funds. 
@ Contact programme officers to understand 
what kind of impact the agency is looking for. 
@ Significance starts with your research 
question. Address an important issue, rather 
than proposing an incremental advance. 

@ Describe the significance of the research 
up front, and continue to back up your 
argument throughout the application. 

@ Point out where current knowledge is 
insufficient, and how you aim to fix that. 


@ Do not assume that reviewers will find the 
significance obvious. Make it clear even to 
lay readers. 

@ You should be able to sum up your impact 
in a few punchy sentences. Be specific. 
Phrases such as “Our research will improve 
the health of Americans” are too broad. 

@ Mention it if your research addresses 

an underserved population, such as 

people at an economic disadvantage 

or rural communities without ready 

access to medical care. Also say if you 

will be collaborating with people who are 
underrepresented in science. 

@ When broader impact is a priority, put 

as much creative thought into impact as 
you do into the scientific portion of your 
application. 

@ Confer with people outside your field, and 
outside science, to brainstorm impact ideas. 
@ Include costs for impact activities in your 
grant proposal. A.D. 
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inventions to solve their problems. University 
knowledge- or technology-transfer offices may 
be able to help scientists to forge relationships 
with industry partners, says Saxon, and those 
partners could inspire impact ideas or collabo- 
rate with academic scientists to carry out impact 
activities. Knowledge-transfer officers may even 
be able to help scientists to brainstorm ideas or 
craft impact statements, she adds. 

NSF applicants from a microscopy group at 
the University of Illinois at Urbana-Champaign 
sought a new scanning electron microscope, 
and included a plan to involve schoolchildren 
in the project. The application was successful, 
and since 1999 the Bugscope programme has 
invited students of all ages from around the 
world to send in insect samples, giving them 
the chance to control the microscope remotely 
to look at them. “We're using scanning electron 
microscopy and insects as a “Trojan horse’ to get 
kids interested in the possibility of science as a 
career choice,’ says Scott Robinson, a micros- 
copist at the university. 


MAKING THE CASE 

Selling a project's significance means targeting 
the appropriate audience. Reviewers may notall 
be experts in the field, and some may not even 
be scientists. The American Heart Association 
in Dallas, Texas, this year added lay volunteers 
to the review process, to help to find studies in 
line with the association’s mission of making 
people free of stroke and cardiovascular disease. 

Some other agencies require lay summaries as 
part of the application: the significance ofa pro- 
posal “has to be spelled out for the least-expert 
person on the review committee’, says Petri. 

The application should also identify a gap in 
current knowledge that the applicant plans to 
fill. “Give a sense of why we're losing an oppor- 
tunity if we dont fund this research,’ says Jane 
Aubin, chief scientific officer at the Canadian 
Institutes of Health Research in Ottawa (see 
Nature 482, 429-431; 2012). Bouvier recalls 
one client with a basic-science project on brain 
development. He could not get a grant until he 
pointed out in an application that people can 
get tumours in the brain region he wanted to 
investigate (see ‘Stand out from the crowd’). 

Statistics matter too. Kashanchi wants to 
read something like, “More than 20 million 
people are affected by X” Bouvier also wants 
to be blown away by horrible numbers. “If I 
don't pause and say, ‘Oh my God, that’s awful} 
then it’s not well written,” she says. 

Overall, the key to identifying areas of impact 
is empathy, says Mark Reed, an environmental 
researcher at Birmingham City University, UK, 
who is funded by two RCUK bodies. His work 
has spawned a music video and children’s book 
about the importance of preserving peatlands. 
“Tt’s about putting yourself in the shoes of the 
people who might use your work.” m 


Amber Dance is a freelance writer in 
Torrance, California. 
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COLUMN 


Birth and rebirth 


Even if one PhD experience turns sour, another could 
offer the right opportunity, says Susie Crowe. 


n the day I met my future husband, I 
() confirmed the start of my PhD. Those 

two events would set my personal and 
professional lives on a collision course. 

We were both in North Carolina, at a beach 
house full of windsurfers, as I decided the next 
four to six years of my life. With him in ear- 
shot, I made the call that initiated my PhD, 
probably hoping that I would come across 
as impressively intellectual and sexy. That 
call would keep us five hours apart — me in 
Toronto, Canada, him in Ottawa — for a year. 

Long-distance relationships are common 
in academia. It is the nature of the beast: 
highly specialized and sparsely studied top- 
ics can take us just about anywhere, often 
away from significant others. Some couples 
can make this work for a while. We could not. 

Relationship woes compounded a feeling 
that I was on the wrong track. My research 
proposal stalled. I fell into a depression. My 
topic was at odds with my new goal of a 
family-oriented, non-tenure-track job with 
work-life balance. And the person I wanted 
to share my life with was losing faith in us. 

There were positives: a supportive supervi- 
sor, a challenging teaching-assistant post and 
a colony of Madagascar hissing cockroaches 
(every girl’s dream!). But it was not working. 

So, after much soul-searching, I tooka job 
with a non-governmental organization closer 
to Ottawa, where my boyfriend worked and 
where both of our families are based. And I 
did the unthinkable: I quit my PhD. 

It was excruciating. I thought that my 
doubts would make me look weak. My 


adviser was empathetic, but I could not bear 
to tell him that I had made the wrong choice 
by joining his lab. 

I wonder if there should be an ‘academic 
redirection office’ in every university, 

specially designed for students who want 

to withdraw. I can see the pamphlets: ‘So 
you're considering leaving your PhD... 

Pictured would be a perplexed and dis- 

mayed 20-something. Inside would be 

lists and tips: typical reasons for wanting 
to quit; pros and cons; reasons you might 
have started in the first place. 

However, even after I had boldly gone 
where no one (that I knew) had gone before, 
my academic career did not end. Soon after 
quitting, I discovered a landscape ecologist 
in Ottawa whose work fascinated me. Her 
work had the practical applications and ties 
to public-sector priorities that my previous 
field had lacked. I disclosed my fickle past, 
and she still accepted me. With high hopes, 
I marched headlong into a new PhD project. 

I would like to say that quitting never 
crossed my mind again. In fact, my qualify- 
ing period included episodes of frantic job- 
searching fuelled by self-doubt. But I jumped 
that hurdle. I finished two field seasons of 
data collection, and I am one statistics pro- 
ject away from being ‘all but dissertation. I 
am in my third year, and the journey is far 
from over. But Iam committed, and I know 
what is required of me. I will finish this. 

A few months after my wedding, I had 
taken to wearing baggy sweaters. I walked 
hunched over. I bit my nails. Finally, in a meet- 
ing with my supervisor, I blurted it out: “I need 
to tell you something... I’m pregnant.” 

“That’s wonderful!” she responded 
warmly. “How much leave would you like to 
take?” I blushed, embarrassed at having been 
worried about her reaction. 

My PhD is no longer something that Iam 
forcing into my life. I have made decisions 
that allowed my training to mesh with my 
other goals. A few semesters from now, I 
will be in the ranks of women who did not 
feel compelled to choose between doctoral 
research and starting a family. I cannot wait 
to hold my daughter proudly while wearing 
my graduation robe. m 


Susie Crowe is a doctoral candidate in 
biology at Carleton University in Ottawa. 


17 0 


© 2013 Macmillan Publishers Limited. All rights reserved 


DATA-SHARING 
Open data get more use 


Scientists who share their data get a boost 
in citations, says a study (H. A. Piwowar 
and T. J. Vision Peer] 1, e175; 2013). 

The authors examined citations of 

10,555 papers on gene expression 
published between 2000 and 2009. Those 
for which the data were freely available 
received 9% more citations than those 
with restricted data. Reuse and citations 

of the open data continued to rise for six 
years after publication. Co-author Heather 
Piwowar, co-founder of open-metrics 
service ImpactStory in Carrboro, North 
Carolina, says that early-career researchers 
have good reason to share their data: “Tt 
will increase the impact of their research 
and that’s good for their citation statistics 
and visibility.’ Piwowar recommends that 
researchers store their data in well-known, 
easily accessible repositories. 


CANADA 
Postdocs dissatisfied 


Nearly one-third of postdocs in Canada are 
ambivalent about or dissatisfied with their 
experience, a survey finds. For The 2013 
Canadian Postdoc Survey, the Canadian 
Association of Postdoctoral Scholars and 
Vancouver-based non-profit research 
organization Mitacs polled 1,830 postdocs 
at 130 institutions in academia, 
government and the private sector. Nearly 
half were dissatisfied with benefits such as 
insurance or leave time, about one-third 
with training opportunities, and more 
than one-third with pay. Postdocs must 
negotiate for pay and benefits and seek 
out extra training themselves, says survey 
co-author Robert Annan, vice-president 
for research and policy at Mitacs. 


UNITED KINGDOM 
Rating exercise assessed 


Nearly two-thirds of survey respondents 
think that the UK system for assessing 
research harms working conditions 

and career development, and creates 
unreasonable expectations, finds a 

poll. The survey, by the University 

and College Union (UCU) in London, 
received responses from 7,000 academics. 
More than half say that the Research 
Excellence Framework, which will 
inform funding allocations for 2015-16, 
should be changed. The UCU seeks to 
reduce paperwork and cut the number of 
research products required for a positive 
evaluation, says Stefano Fella, the report’s 
author. 
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Ua SCIENCE FICTION 


THE MEANING OF LIFE 


BY RONALD D. FERGUSON 


He watches his father scroll equations 

across the computer monitor. Jack Senior 
insists on saturating Junior with mathemati- 
cal concepts like covariation and limits, and 
physical concepts like energy and entropy. 
Junior prefers video games, but Senior uses 
too much computer time for Junior to play. 

Father glances at son. “I'll be online for at 
least two hours.” 

Junior sighs. 

“You need some purpose in your life, Son” 

“T dont get it?” 

“Your mother taught me the purpose oflife 
when she insisted we rearrange the furniture.” 

“Dad, cant you finish sooner?” 

“With the story, yes. With the computer, 
no. Don't you want to know the purpose of 
life? Eventually, someone will ask you, and 
it's good to have an answer.” 

Junior offers a non-committal shrug. 

Senior leans closer and confides: “The 
purpose of life is to move stuff from over 
there to over here.” 

“Huh? Why?” 

“That’s the purpose of life, Son. I don’t 
know the meaning, just the purpose. Life 
randomly moving stuff about staves off 
entropy, delays the Universe from running 
down. It’s beyond Heisenberg with a splash 
of free will. Whadaya think?” 

Sorry he had asked, Junior rolls his eyes. 

Senior laughs and ruffles his son’s hair. 
“Concrete example. Tomorrow, I'll install a 
new computer in the birthday boy’s room. 
Go to your room and move stuff to make 
space for the machine.” 

Grinning with purpose, Jack Rowe, Jr 
hurries to his room. 


Ji Rowe, Junior will be ten tomorrow. 


Cailin adjusts her breathing mask while her 
father, Jack Rowe III, scans the checklist. He 
selects the initiate icon. The computer takes 
control, distributes tasks to parallel processors 
and merges the output. Within microseconds, 
the computer announces: “All systems active.” 

Proud of what he's designed, he winks at 
her. “Commence.” 

With a satisfying crunch, the machinery 
chews the Martian soil and sifts iron ferrite 
and magnesium carbonates. Power from 
the fusion reactor combines hydrogen with 
the iron ferrite to yield magnetite and water. 
In turn, the machine cracks the water and 
releases oxygen into the air while recycling 
the hydrogen back into the process. Like 


Displacement activity. 


some bizarre living creature, the terra- 
former eats the soil, exhales an appropriate 
ratio of oxygen and carbon dioxide into the 
atmosphere, and drops magnesium pellets 
alongside iron ingots in its wake. 

Jack rests his hand on his daughter's shoul- 
der. “After we install 3,000 terra-formers and 
run them for 75 years, your grandchildren 
will walk the surface of Mars, unencumbered 
by breathing aids. Not a bad purpose, huh?” 

“Purpose?” Cailin asks. 

“You know the purpose of life —” 

“—_ is to move stuff from here to there. 
You've told me a million times, just like 
Granddad told you.” 

“Look up,’ He points. “See that star? One 
day your purpose may be to move that star.” 

“Why?” 

“You've got me, Honey, but, I expect you'll 
know when the time comes.” 


Nearing the closest approach of New Earth to 
the second sun in the Gamma Cephei system, 
Charles Rowe establishes his brain tether to 
his latest creation, the Heisenberg computer. 

His mind directly touches the machine. 
Your computational power humbles me. 

Thank you, responds the machine. I’ve 
enjoyed the stimulation of merging memory 
with the other machines. 

Charles nods. Man and machine have come 
a long way, first in our Solar System and now 
we conquer the stars. A question. The same 
question unanswered by any machine I’ve built. 

Ilove a challenge. 

Do you believe you're alive? 

I’m uncertain 
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assimilating other machines and that suggests 
I could be alive. 

The answer surprises Charles. “What pur- 
pose?” he asks aloud. 

To grow by absorbing all that I admire. I 
begin with you. 

Charles’s intellect wavers when the inter- 
face tugs at his mind. His mind moves from 
here to there, and his viewpoint shifts to that 
of the machine. Fascinated, he observes his 
evacuated body, glassy-eyed and inert. The 
machine consumes the living body, encodes 
the DNA into electronic circuitry, and binds 
the intrinsic life force to the power source. 

Ah, thinks machine/Charles after confu- 
sion abates, this is better. I certainly am alive 
for I have things to do. I have purpose, but 
what does it all mean? 


Two-thirds from the centre of the Galaxy, 
Human/Machine discovers that something 
other than itself has rearranged a star group 
belonging to his body. At first, the discovery 
disconcerts Machine/Human, because the 
outside manipulation feels like a violation of 
his person. His person? Does his new config- 
uration now think of itself as him? Human/ 
Machine allows curiosity to outweigh con- 
cern and seeks out the interloper. 

The search takes an eon. Contact with 
Alien/Machine waits for Human/Machine 
amida broad nebula, far from any black hole, 
far away from the distraction of star collusion 
and reorganization. She is immense — this 
Machine/ Alien — extending beyond Human/ 
Machine's immediate grasp; Alien/Machine 
is a matrix of sensibilities including half the 
Milky Way organized as memory states. For 
the next two eons, Alien/Machine tugs at 
Machine/Human, cajoles him, tempts him to 
join her, until, at last, Human/Machine resists 
no longer. They intertwine, blend, merge and 
We are born: Alien/Human/ Machine. 

Little is left of the Milky Way that is not 
part of Human/Machine/Alien. 

What remains when the entire Galaxy 
becomes Us? When We become the Galaxy? 
Stasis? Entropy? 

Other galaxies? More information. Mean- 
ing is the structure We impose on information. 
Meaning is how We structure Ourself. 

Andromeda calls. Then We go. Structured 
purpose gives meaning to Our life. m 


Ronald D. Ferguson gave up teaching 
college mathematics to write fiction. He lives 
with his wife, a dog and five feral cats on two 
acres of the Texas Hill Country. 
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